Wednesday, September 5, 2007

Detecting design

I haven't read William Dembski's books and so am not attempting to refute them. However, I have scanned his online writings, including his rebuttal of critic Richard Wein. In these intelligent design posts, my purpose is simply to think about what might constitute a strong suggestion of design, and related issues.

Dembski's essential point is that some biotic phenomena are so fantastically improbable as to constitute evidence of a designer's forethought.

One example attributed to Dembski is the case of a county clerk who "randomly" placed either Democrat or Republican on the top line. In 41 elections, the Democratic clerk placed a Republican on the top line once. The probability of the exact sequence is 2-41 = 4.54 x 10-13 and the probability that exactly one R would appear anywhere in the sequence is 41C1(2-41) = 1.86 x 10-11. Had 20 R's appeared, the probability would have been 41C20(2-41) = 0.12, which would be at the center of the distribution curve. Either of the other probabilities would lie outside a "confidence interval" of 99 percent, meaning we would have strong grounds to believe that a nonrandom force had been at work. Considering other information, such as the fact that the clerk had a motive to bias the choice, we feel quite sure that the sequence is nonrandom, though mathematically we do not have absolute proof.

(I realize that Dembski doesn't fully accept some commonly held ideas concerning statistical inference, but I will not address those matters here.)

Another example attributed to Dembski is a sequence of coin flips (perhaps 100) that, when recorded as 0s and 1s, reveal a sequence of consecutive binary integers.

In this case, the probability is 2-100 = 7.88 x 10-31, a preposterously small number that would immediately make us conclude that a nonrandom force was at work. But why so? In the case of a sequence containing say 10 heads and five tails, we want to know the probability associated with a subset of sequences. That is, we do not narrow down the probability to one element. So the probability of one specific sequence is 2-15 = .0000305, but the probability of any element of that subset is .09.

In the case of the binary digit series, we have reason to specify one element and so we conclude that there is reason to suspect a nonrandom force. That is, the bias in each individual toss appears to be very strong and the conjecture that the tosses are independent appears to be false. But, if we simply find that the 10 heads and 5 tails are not obviously 1-to-1 with a known sequence, then we have no reason to specify further down than the subset level. We could not easily reject the conjecture that the tosses are independent events, though we would have grounds to believe that the coin was biased. We could not rightly say that a nonrandom force was at work beyond simple weight bias.

That is, if we have reason to select one and only one element from the set of sequences (with n sufficiently large), then we have reason to suspect that the events are not independent and that there is at work some highly delineated nonrandom force.

So now we return to the issue of detecting design, or, that is, intelligent design (see previous ID post below).

A designer of a machine or network system actually designs an algorithm, which we might view as a sequence of logic gates. We are able to express this circuit using some logic language L. An algorithm would be 1-to-1 with any grammatical sentence in L. If we have a large enough sample of expressions, we might detect the presence of L in some sequence by checking the frequency ratios of pairs of symbols in L. If the sequence is indicative of L, the scatter plots of the symbols will show strong correlations.

For example, we could have 30 characters in E (English) appearing randomly over 10,000 spaces. For x,y element of EXE, the scatter plot correlation will be weak. But, if an excerpt from an English-language novel appears in those 10,000 spaces, the scatter plot for the pair x,y will be far more correlated.

An ungrammatical 10,000-character statement would be noise, but a grammatical statement would show pair correlations. This holds even if we can't even read English or L.

Still, a physical system must be modeled by a grammatical statement. An ungrammatical statement implies a bogus physical system (or noise).

But there is one more issue here. Machines, even relatively inefficient ones, have little internal noise in their design. So perhaps we should consider pattern recognition.

XXJXOXXXEXXX

produces a readable pattern despite the noise. As noise increases, readability decreases, so signal-to-noise ratio may be something to consider.

1AJMOLNYEV4MM

is so noisy that the word Joe may take far longer to discern.

So suppose we have a grammatical statement embedded in noise whereby the noise is represented by dummies. A pair with a dummy will show low correlation. Now, if we are painstaking and have a large enough sample, we might be able to distinguish the signal from the noise, even when the SNR is unfavorable for routine pattern recognition.

Now suppose we have a sufficient sample which is high on noise but low on signal. Yet the signal (grammatical statement) is there -- that is, a physical system is functioning. Would we be correct to conclude design? I think possibly not. In a large enough environment, low entropy sequences are at least plausibly the result of independent events. That is, we may find a "grammatical" pattern in a long enough sequence of garble (remember The Bible Code?).

However, if the sequence contains a grammatical statement describing the system's algorithm is low on noise, we might have grounds to suspect that the events are not independent and that some powerful nonrandom force is at work. Whether we can assume that the powerful nonrandom force has consciousness is another matter.

But, let us consider computer system and internet bugs. Bugs are like noise in a system. They are usually unintentional consequences of complexity though they might seem to be the work of a malicious hacker. But, normally, though a bug might have a cascade effect (butterfly effect), it does not replicate and does not transmit itself.

However, computer worms, viruses and Trojan Horse parasite programs are an obnoxious internet presence. I would be eager to know whether there has been one worm, virus or Trojan Horse that has arisen spontaneously from a peculiar confluence of bugs or other computer oddities. So how does a computer bug differ from a malware program? It's in the code. The malware code has a much higher information quantity than does the bug. The code sequence does not arise spontaneously, though it might I suppose if there were sufficient time and energy.

No comments: