(No I'm not talking about Crime Scene Investigation, fool!)
OK, so I really should be focusing on the Protein Challenge. However, being easily sidetracked, I got thinking about Dembski's concept of CSI - Complex Specified Information. It's a surprisingly hard concept to understand, not least because AFAICT Dembski makes it as difficult as possible to do so.
The basic principle is that evolutionary processes aren't supposed to be able to produce structures that are improbable (for a sufficiently well-defined value of "improbable"), complicated (ditto) and specified. The concept of a specification is basically an attempt to extend the basic probabilistic concept of an event to handle post-hoc reasoning. A specification defines a target space of possible things that can happen.
The argument goes as follows:
1) Chance processes tend not to give results that are both unlikely and specified. So, for example, drawing 13 cards and getting all spades is highly unlikely, and you'd rightly assume that someone had tinkered with the deck.
2) Natural processes (regularities) tend not to give results that are both complex and specified. For example, though a snowflake may be complex, you won't get the same one twice.
3) Hence, anything that is complex, improbable and specified is most likely the result of intelligent intervention (nb. human brains apparently don't qualify as natural processes).
There are some problems when you try to extend this to evolution and genetic algorithms and so on - both are quite capable of generating complex, improbable, extremely useful systems. Dembski gets round this by saying that genetic algorithms can generate CSI if and only if the target space associated with the specification represents a local optimum of the fitness function. GAs work if and only if the problem you feed them (fitness function) is actually the one you want solved (specification).
That's why examples like the infamous "methinks it is like a weasel" work - the specification we choose (the text) is 'coincidentally' identical to the optimum of the fitness function. Dembski, if I understand correctly, points out that, unless we select our specification to correspond to the fitness function we're using, we still won't generate CSI. We'll have complex information, but it won't match the right specification. As such, he feels justified in saying that, in feeding the GA the right problem for our desired solution, we're "smuggling" CSI into the system.
The problem here is that, looking at the "fitness function" to which real-world genes are exposed, we see that it's basically something along the lines of "ability to survive and breed". In that context, the ability for a gene or combination of genes to produce something like a flagellum would certainly be of value for survival, and hence could represent a local optimum of the fitness function. The flagellum could evolve despite its CSI, because evolution would be selecting for the same underlying trait that we're basing our specification on - ability to live long and prosper.
Thus, simply by basing his specifications on the functionality of a system, Dembski is setting up a range of target spaces that evolution can quite definitely find. It's something of a Texan Marksman issue - Dembski is running round painting targets around all the areas that evolution by natural selection is naturally inclined to hit.
Complex - refers to Kolmogorov complexity, best thought of as a measure of how easily a system can be described. So, for example, "AAAAAAAAAAAAAAAAA" would be low-complexity, "AABBCCDDEEFFGGHH" would be higher, and a random string like "BJECDWYIVFYUEUBUFIIHI" would be highest.
Information - refers to Shannon information, also known as the "surprisal" of a system. So, for example, "EEEEEEEEEEEEEEEEEE" would be fairly low-information because E is a common letter - it doesn't surprise us to see it. "I LIKE FISH" would be higher, as not all of its components occur with such frequency. "XXXXXXXXXXXXXXXXXXXXXX" would be very unexpected (except in the context of really strong beer) so gets a high "surprisal" value.
Target space - refers to a set of states that we'd like the system to end up in. So, for example, the target space of a system composed of lots of bits of wood might be a bookshelf.
Search space - refers to all the states that a system could end up in. So, for example, the search space of a system composed of bits of wood could include both bookshelves and mere piles of planks.
Specification - a simple delineation of the target space. For example, the specification "bookshelf".
Genetic algorithm - a program that attempts to imitate evolution in a model system.
Fitness function - something that allows a GA to tell which of a group of organisms is the most "fit". In the real world, the primary attributes of the fitness function are ability to survive (natural selection) and ability to attract mates (sexual selection).
Local optimum - an area of the search space where there are no small changes that can increase the corresponding organisms' fitness. If you think of fitness as corresponding to height on a graph, the local optima are the peaks of the resulting mountain range.