Friday, August 25, 2006

Open mouth, insert money #2

I've been cutting back my online debating recently in favour of doing some actual learning wrt computational biology. However, one culture-wars blog I have kept visiting is Exile From Groggs, which is the most sane ID blog I've found to date.

In particular, it's occasionally possible to convince Paul (the blog owner) of something, given sufficient effort. This makes him practically unique in the world of blogging, let alone the ID community's segment thereof. And it's on this note I wish to speak.

I recently spent some effort trying to figure out exactly which parts of evolutionary theory it was that Paul had a problem with. One of the issues that came up was that, whilst he was broadly convinced that genetic algorithms such as real-world evolution could optimise very simple systems, he wasn't sure how they'd handle something as complicated as proteins, where the search spaces can be Bad And Wrong. The challenge was laid down and accepted: demonstrate that evolutionary algorithms could work in the context of proteins.

This challenge actually has two major components. Firstly, I need to determine a sequence->fitness mapping for a given selection of proteins. Secondly, I need to implement an algorithm using both this and biologically-realistic reproduction&mutation to simulate the real-world evolutionary optimisation of the protein within a given protein-space*. The first part is by far the more difficult.

I threw out a bunch of suggestions as to how this could be computationally modelled (rather than having to produce every single possible protein variant in the lab), and the option that seems to have been settled on is:

I work backwards by looking at an existing protein family, comparing the efficacy of the various proteins, and assuming that everything similar to one of these is also effective. I use some kind of sequence-based active-site-detecting process to fine-tune our guesstimate of the effectiveness of the similar proteins

In bioinformatics, this sort of thing can be worked through quite easily, but it's all too simple to produce hideously wrong answers. In particular, in this case it'll be fairly easy to determine which proteins will work well - the challenge will be avoiding false negatives.

More after the break.

* Syntactic note: we've been referring to the space of protein sequences that perform acceptably at a given task as the protein-space of that task. If we talk about the protein-space of a protein, we're talking about all the proteins that accomplish one of the same tasks as that protein in roughly the same fashion as that protein. This does not include wildly different proteins that happen to perform the same task; we're concerned primarily with families of similar proteins.

No comments: