Why so uptight?
The reason I think it's important to scope this challenge out in detail beforehand can be expressed in two words: goalpost shifting. Now, Paul is very good on this front compared to the vast majority of online debaters, but even he has his moments, and I'm sure I do too. Scoping everything out in advance means that neither of us can get confused (accidentally or otherwise) about what we were setting out to demonstrate. As I know from long hours of arguing with my sister, this can save much acrimony in later life.
The proof of the pudding
What is it we're setting out to achieve? The goal of this challenge is to answer the concerns Paul expressed in the following comment:
"When a protein emerges ... it will tend to improve its performance towards its local maximum." I am happy with this in some contexts - where there is strong selection. But I would like to see this at least demonstrated in representative computer models before I would accept that this works in nature.
This challenge (there may be others) is therefore specifically addressed at seeing whether evolutionary processes can efficiently optimise the functionality of a protein. This is not going to be easy - the production of proteins from DNA sequences is not terribly easy to compute, and the effect of those proteins even less so. As such, we'll need to break the problem down a bit (see later).
The groundrules
1) This challenge will only be considering the protein-space of one protein task. This could be, for example, the ability to catalyse a given reaction. Selection of the task will be contingent on the ability to find both a number of different proteins in the task's protein-space and a means of determining success at the task. This challenge will not consider:
a) the ability of organisms' genomes to find the protein-space in the first place
b) the origins of the genome itself
c) [more negatives will be added here as necessary]
2) This challenge will proceed using only biologically-realistic information:
a) All minor properties of the GA will be accompanied with journal or textbook references supporting their acceptability
b) The model used for determining the efficacy of proteins will be tested for accuracy before use. If it's not accurate, I'll go back to the drawing board and pick a more accurate but harder option.
3) On completion, the resulting GA will be initialised with a population of proteins that are distinctly suboptimal at their task
4) The party whose conclusions were not supported by the program's results will (if they so wish) have two weeks to pinpoint any unrealistic components, citing academic papers where appropriate. If any are found, the GA will be changed as necessary and rerun. The loser this time round will have only one week to point to inaccuracies.
The breakdown
I've already discussed the primary division between setting up the fitness function and incorporating it into a GA. The hard part here is the first bit, which I see as breaking down into three essential components:
1) Figuring out what a given protein's physical properties (shape, configuration, charge distribution) are
2) Figuring out how they'll perform with respect to the protein-space's assigned task
3) Figuring out what effect that'll have on the whole organism
That effect (fitness) can then be used to determine each organism's survival chances, as is necessary for the GA to function.
Once I've done that I'll go into more detail on the creation of the GA.
Isn't this overkill?
Yes, and quite horribly so given that all Paul wanted was a demonstration that this stuff was possible. The reason I'm going so overboard on this is because I have every intention of using the code I produce here for other stuff in future (there will be other challenges to answer other questions, such as the ease with which the various protein-spaces can be located). I'd also like the demonstration to be as conclusive as possible, of course.
Read the full post