Kryptograff: 2007

Wednesday, December 26, 2007

A recursion function for square roots

OK. This post is NOT a big deal. I posted a previous version on Angelfire several years ago and, after reviewing it last night, decided to polish it up just a tad.

The recursion function

X_n = (X_n-1 + C) + C

is a special case of the continuing fraction



C + 1
    ----------------
    C + 1
        ------------
        C + 1
            --------
            C + ...

which, taken to infinity, equals (C + (C² + 4)^0.5)/2.

The recursive equals that continuing fraction when X₀ = 0.

However, we point out that any real, other than -C, can be used and the result in the infinite limit is a single value.

To see this, we write three iterations of the recursive thus:

((((X₀ + C)^-1 + C) + C)^-1) + C)^-1

And if you write it out in traditional form using 1/z rather than z^-1, it becomes obvious that

lim_n->inf X₀/f(X) = 0, where f(X) is the recursive function.

And note that if X₀ =/= Y₀, then at any step n of the recursive, the values are not equal. There is equality only in the infinite limit.

An example for C = 2, n = 4.

lim_n->inf X_n = 1/2(2 + 8^0.5) = 1 + 2^0.5



X₀ = 1

3.0
2.333...
2.428571429
2.411764706
2.414634146

X₀ = 1/2

2.5
2.4
2.41666...
2.413793103
2.414285714

X₀ = 31

33.0
 2.0333...
 2.492537313
 2.401197605
 2.416458853

X₀ = -31

-29.0
  1.965517241
  2.508771930
  2.398601399
  2.416909612

X₀ = 1/31

2.032258065
2.492063492
2.401273885
2.416445623
2.413830955

X₀ = -1/31

1.967741935
2.508196721
2.398692810
2.416893733
2.413754228

Wednesday, November 14, 2007

Probability of intelligent design

Draft 4

How might we calculate the probability that a very primitive organism occurs randomly? Herewith is an approach:

Let us grant that any organism is a machine that changes a set of inputs into a set of outputs.

In that case such an organism can be modeled by either a Turing machine or Boolean circuit (it is easy to show that Turing machines require Boolean logic gates).

In that case we can choose some economical numbering scheme for any Turing machine or Boolean circuit. Perhaps we'll use Godel numbering as a bookkeeping measure to keep track of the order of the gate numbers. So we can string a sequence of gate numbers together as one number that represents the organic machine.

Supposing that we are using binary numbers, the probability of any one string occurring randomly (with each digit independent and equiprobable) is 2^-m, where m is the length of the digit string.

Now the probability that a particular machine does not occur is given by the complement 1 - 2^-m. So then the probability that the machine's string does not occur in a string of n digits is given by (1 - 2^-m)ⁿ.

Of course, perhaps the gates are not equiprobable and yet our method only varies probability by string length. However, there must be in that case some primitive components of the gates that can be written as simple digit strings where 0 and 1 are equiprobable (there seems no good reason not to make those digits equiprobable in virtually all cases).

Using an artificially low example, a string of 2^-16 digits has a probability of occurring in a string of 10⁶ digits of 0.999999764, or virtually certain.

But the ratio 2^-16/10⁶ = 1.526 x 10^-11 does not seem unreasonable on the face of it. In fact, one might expect much lower ratios.

Of course by abstracting out all the physical data, we may well be overlooking an important consideration.

Still, once physical factors are taken into account, one would expect this approach to be useful.

An interesting concept: Take the typical information value of a virus code to obtain a string of n bits. Then establish a software program that concocts strings of length n randomly. We grant the program considerable computational power -- perhaps it uses a network of parallel processors -- and determine how many bits per second it can process. Without actually running the program, we can then determine the probability that a particular virus code will emerge randomly in some specific time interval.

(We could notionally add something to each string that allows it to "escape" into the internet. However, only a virus string will continue past the first receiever.)

A possible physical approach: atoms of each element have a probability of combining with atoms of another over some temperature range and depending on density of each in some volume. Compound molecules likewise. These atoms and molecules can be numbered, according to the scheme outlined, and probability of a primitive cell might be worked out.

Of course, we should posit a plausible environment: sufficient heat and a good mix of chemicals in a small enough volume over a large enough time. In principle, one should be able to test whether the probability is acceptable or not for life to emerge from an inorganic stew.

I realize that probabilities have been used to try to assess such a happenstance. And complexity theory has been developed in large part to address the issue of emergent order (negative entropy). In other words, emergent order implies that in some cases conditional probabilities are to be used. However, when a sample size is large enough, we can often ignore conditional probabilities and treat events as independent, as the central limit theorem tends to show.

Emergent order implies that conditional probabilities become predominant at the level of pattern recognition (emergence). However, before that level, the component events can usually be treated independently, if not as equiprobable.

Some numbers
Cray's Red Storm supercomputer has exceeded the 1 terabyte per second mark and Cray talks of a bandwidth capacity of 100 terabytes per second.

If we guess that a string of code that not only self-replicates but self-defensively morphs must have a minimum of 256 bits and put to work a 100 terabyte-per-second processor spitting out binary strings of 256 digits, we obtain:

10¹⁴(2³) = 8 x 10¹⁴ bits per second versus 2²⁵⁶ = 1.16 x 10⁷⁷.

The probability that the specific string will be generated randomly corresponds to an average time interval between hits of 1.45 x 10⁶⁶ seconds, or 4.59 x 10⁵⁹ years, with the age of the cosmos being on the order of 10¹⁰ years.

How many strings of length 256 would replicate and morph? It's hard to believe the number would exceed 10⁶ -- because various substrings follow specific permutations in all cases -- and the difference in probability is negligible.

What is design?
But, when we detect a computer virus, why do we infer design? Because we know that such code IS designed non-randomly. We can say that in 99.999etc. percent of cases investigation reveals a human agent. So we have a frequency or assumed frequency for a nonrandom cause for comparison.

Similarly, we may compare a digit string of length n to "common" strings of that length. For example, if we see the first nine digits of pi"s decimal extension, we compare the randomness probability of 10^-9 with the fact that that sequence is associated with a set of human-discovered algorithms. We can guess that the number of practical algorithms for producing nine-digit sequences is low by comparison with 10⁹. (By practical, we mean an algorithm that is used often in math, science or engineering.)

Now when it comes to an organism at stage 0 of evolution, we may find an amazingly low probability of random occurrence. But we should not ask WHY SHOULD such an event have occurred but rather WHY NOT? If a grammatical (functioning) logic gate string is one of a set of equiprobable strings, then, absent further information, there is no reason to deny that the string occurred randomly. On the other hand, if we ask, "what is the chance that such an event WILL happen," the probability tells us not to hold our breath waiting.

Now modeling a stage 0 organism as a machine represented with a blueprint of logic gates, we may ask, when is a machine designed rather than a consequence of random forces? A machine is determined to have been designed if a human-devised algorithm for its construction can be found and if experience shows that the system rarely if ever is seen naturally. Suppose we camouflage the machine such that immediate sensory clues are of little importance. So we examine the machine's operation and draw up a description, using appropriate symbolism, to see whether the description matches some subset of known human-derived algorithms or blueprints. If so, we infer design based on the higher frequency of designed machines to randomly generated machines.

So then, in order to infer design of a stage 0 organism, we would need to have available at least one human-discovered formula. But if we had that, it would radically alter the entire debate.

So how would a higher power design a stage 0 organism? No answer is given since, perhaps on the assumption that the human mind could not fathom such a design. That may be true. However, one cannot show a strong probability of design without a reference base, and there are no divine blueprints that can be examined for comparison.

It should be noted that if it were argued that lifeforms had emerged spontaneously on several occasions at a rate higher than what would be expected randomly, then we would have a basis for believing that a non-random force was at work within some confidence interval. However, one emergence gives little basis for drawing conclusions.

That said, one may agree that the apparent startlingly low probability associated with life's origin may well stir much inner reflection.

Saturday, October 27, 2007

The Jordan curve theorem holds for wild curves

Draft 3

Assertion
The Jordan curve theorem holds for a set of non-visualizable, "wild" curves that can be posited in accord with the Well-Ordering Principle.

Prefatory remarks
An intuitive idea of the well-ordering of the reals can be had by considering the following:

We have that any infinite digit string of 0s and 1s may represent a real. Now a string of length n digits may be ordered from least to greatest in 2ⁿ ways, with the awareness that any digit after n is a 0.

This denumerable ordering holds for any finite n. However, denumerable ordering does not hold for the entire set 2^N of course. But the Well-Ordering Principle can be interpreted to mean that the set of string permutations is precisely ordered at 2^N.

We can obtain the set of all curves, including wild non-differentiable and non-fractal-like curves thus:

Orienting ourselves on the x-y grid, using Quad I for convenience, we can arbitrarily assign any y height to any x value in the interval, say, [1,2]. By this, two neighboring x points can have wildly different heights, though there would still exist a slope for them. But, by the Well-Ordering Principle, there must exist two points that are precisely ordered and that have 0 distance between them. These two points will have no slope and constitute a wild, non-visualizable curve "section." That is, we have a situation where there is a y height for every real but no slope for the curve anywhere, even though y does not equal 0.

Though the area under such a curve must be less than 1*y_max, we may find it difficult to evaluate this integral or even give a good approximation by numerical methods.

To complete the set of all planar curves, we mention fractal and fractal-like curves, which are discussed briefly below.

Proof
We form what I call a molecule, or bubble, with the understanding that such an entity may not be visualizable with a drawing.

We may define an origin-based molecule as r = cosx where, with a well-ordering of the set of radians, r is any finite length. Additionally, r(x) = cosx is a relation with 2n-1 values, accounting for "fingers" -- any of which may be infinitely short -- and "interior molecules" beyond the neighborhood of origin. An interior bubble is defined in the same way as a basis bubble, except that its relation r' = cosx requires that for every value of x, r'(x) is less than r(x) and falls between origin and r(x). (Note: an interior bubble is defined by the relation and is not considered an a priori figure here.)

This will suffice to describe any n-loop; i.e., simple loop or pretzel, though we must shortly consider some sets.

[To help one's mental picture, we can proceed thus after forming a basis molecule whereby there are no fingers or holes. We form a second molecule, which may or may not be congruent, and map it onto the first by arranging a translation of coordinates, the effect of which is to intersect the two bubbles such that they share at least two points. All points other than the join points are then erased. The join points are those which do not intersect only a bubble's points.

[We can orient a bubble any way we like and add bubbles to bubbles to our heart's content. We may get a simple loop or an n-loop pretzel.

[A pretzel may appear when two or more bubbles intersect and the intersection set is construed to be "empty" or "background." If a pretzel hole appears, then the intersection obeys the relation requirements of a molecule. A single-hole pretzel is defined by the relation s(1) and s(2) each have one value and there is a subset for which there are exactly four distinct values of s(x).]

Now the interval [r(x)_low,r(x)_high] is (ignoring fractals), a finite line segment. So we regard those two values as the end points of the line segment. We then require a Dedekind cut between such an end point and the end point of the corresponding, coinciding half-line. In the case of fingers and pretzel holes, we have rather than a half-line, another line segment, of course.

Now the set of such Dedekind cuts maps as a continuous curve about a finite area with no end points. Suppose the boundary curve had two end points. In that case the relation r would have 0 or 2n values, a contradiction.

So, with respect to a molecule, we have that a point on the plane is either an element of the figure's Dedekind boundary set, an element of a line segment r = cos(x) such that there are 2n-1 values of r(x) (not including origin), or neither. So then, the Jordan curve theorem is proved for molecules -- n-loops -- with continuous or wild curves. (We have not bothered with the matter of nested sets of interior loops, which clearly follows from the preceding.)

In the matter of fractal, or fractal-like, curves, it is plain that a fractal construction at any finite step n is composed of a set of self-similar bubbles of diminishing size. Clearly our proof holds for any such step. By the way, we can see immediately that we can apply, at least notionally, a wild curve to a fractal, giving us a whole new set of fractals.

In the case of a fractal, the non-trivial fractal "slopes," though non-differentiable, take on infinitesimal form in the infinite limit, but what form does a wild fractal curve take? The wild part has 0 slope. So whether the curve can be said to exist in the algorithmic limit must be determined from consideration of axioms. At this point, I am content to point out such a situation.

The Jordan curve theorem also applies to any curve of infinite length that is found at, above or below the x axis. We have the relation r(x) which may have 2n-1 distinct values. We then follow the arguments above and have that a point is found in a Dedekind boundary, between a Dedekind boundary point and r(a)_2n-1 or not in the Dedekind boundary but above r(a)_max or below r(a)_min.

Wednesday, September 5, 2007

Detecting design

I haven't read William Dembski's books and so am not attempting to refute them. However, I have scanned his online writings, including his rebuttal of critic Richard Wein. In these intelligent design posts, my purpose is simply to think about what might constitute a strong suggestion of design, and related issues.

Dembski's essential point is that some biotic phenomena are so fantastically improbable as to constitute evidence of a designer's forethought.

One example attributed to Dembski is the case of a county clerk who "randomly" placed either Democrat or Republican on the top line. In 41 elections, the Democratic clerk placed a Republican on the top line once. The probability of the exact sequence is 2^-41 = 4.54 x 10^-13 and the probability that exactly one R would appear anywhere in the sequence is 41C1(2^-41) = 1.86 x 10^-11. Had 20 R's appeared, the probability would have been 41C20(2^-41) = 0.12, which would be at the center of the distribution curve. Either of the other probabilities would lie outside a "confidence interval" of 99 percent, meaning we would have strong grounds to believe that a nonrandom force had been at work. Considering other information, such as the fact that the clerk had a motive to bias the choice, we feel quite sure that the sequence is nonrandom, though mathematically we do not have absolute proof.

(I realize that Dembski doesn't fully accept some commonly held ideas concerning statistical inference, but I will not address those matters here.)

Another example attributed to Dembski is a sequence of coin flips (perhaps 100) that, when recorded as 0s and 1s, reveal a sequence of consecutive binary integers.

In this case, the probability is 2^-100 = 7.88 x 10^-31, a preposterously small number that would immediately make us conclude that a nonrandom force was at work. But why so? In the case of a sequence containing say 10 heads and five tails, we want to know the probability associated with a subset of sequences. That is, we do not narrow down the probability to one element. So the probability of one specific sequence is 2^-15 = .0000305, but the probability of any element of that subset is .09.

In the case of the binary digit series, we have reason to specify one element and so we conclude that there is reason to suspect a nonrandom force. That is, the bias in each individual toss appears to be very strong and the conjecture that the tosses are independent appears to be false. But, if we simply find that the 10 heads and 5 tails are not obviously 1-to-1 with a known sequence, then we have no reason to specify further down than the subset level. We could not easily reject the conjecture that the tosses are independent events, though we would have grounds to believe that the coin was biased. We could not rightly say that a nonrandom force was at work beyond simple weight bias.

That is, if we have reason to select one and only one element from the set of sequences (with n sufficiently large), then we have reason to suspect that the events are not independent and that there is at work some highly delineated nonrandom force.

So now we return to the issue of detecting design, or, that is, intelligent design (see previous ID post below).

A designer of a machine or network system actually designs an algorithm, which we might view as a sequence of logic gates. We are able to express this circuit using some logic language L. An algorithm would be 1-to-1 with any grammatical sentence in L. If we have a large enough sample of expressions, we might detect the presence of L in some sequence by checking the frequency ratios of pairs of symbols in L. If the sequence is indicative of L, the scatter plots of the symbols will show strong correlations.

For example, we could have 30 characters in E (English) appearing randomly over 10,000 spaces. For x,y element of EXE, the scatter plot correlation will be weak. But, if an excerpt from an English-language novel appears in those 10,000 spaces, the scatter plot for the pair x,y will be far more correlated.

An ungrammatical 10,000-character statement would be noise, but a grammatical statement would show pair correlations. This holds even if we can't even read English or L.

Still, a physical system must be modeled by a grammatical statement. An ungrammatical statement implies a bogus physical system (or noise).

But there is one more issue here. Machines, even relatively inefficient ones, have little internal noise in their design. So perhaps we should consider pattern recognition.

XXJXOXXXEXXX

produces a readable pattern despite the noise. As noise increases, readability decreases, so signal-to-noise ratio may be something to consider.

1AJMOLNYEV4MM

is so noisy that the word Joe may take far longer to discern.

So suppose we have a grammatical statement embedded in noise whereby the noise is represented by dummies. A pair with a dummy will show low correlation. Now, if we are painstaking and have a large enough sample, we might be able to distinguish the signal from the noise, even when the SNR is unfavorable for routine pattern recognition.

Now suppose we have a sufficient sample which is high on noise but low on signal. Yet the signal (grammatical statement) is there -- that is, a physical system is functioning. Would we be correct to conclude design? I think possibly not. In a large enough environment, low entropy sequences are at least plausibly the result of independent events. That is, we may find a "grammatical" pattern in a long enough sequence of garble (remember The Bible Code?).

However, if the sequence contains a grammatical statement describing the system's algorithm is low on noise, we might have grounds to suspect that the events are not independent and that some powerful nonrandom force is at work. Whether we can assume that the powerful nonrandom force has consciousness is another matter.

But, let us consider computer system and internet bugs. Bugs are like noise in a system. They are usually unintentional consequences of complexity though they might seem to be the work of a malicious hacker. But, normally, though a bug might have a cascade effect (butterfly effect), it does not replicate and does not transmit itself.

However, computer worms, viruses and Trojan Horse parasite programs are an obnoxious internet presence. I would be eager to know whether there has been one worm, virus or Trojan Horse that has arisen spontaneously from a peculiar confluence of bugs or other computer oddities. So how does a computer bug differ from a malware program? It's in the code. The malware code has a much higher information quantity than does the bug. The code sequence does not arise spontaneously, though it might I suppose if there were sufficient time and energy.

Monday, September 3, 2007

Weeding out dummies

Below we discuss a statistical method for discriminating between signal and noise. We might be able to tell whether a transmission from deep space is a message of some kind simply by doing scatter plots of presumed symbol pairs.

This leads to a cryptological point: the use of dummies is not neceesarily a protection against frequency analysis. When a dummy is paired with a symbol, the scatter plot may well show low correlation (no football shape). The code-makers must make sure in advance that every dummy pairs with every symbol in a likely ratio.

Otherwise, code crackers can check correlations and weed out all the dummies. They are then left with the simple task of doing a routine frequency analysis on the remaining symbols.

Yet I can't help thinking that for dummies to be useful that they will either correlate poorly with symbols, or that, if they are designed to correlate strongly, the correlation will not be characteristic of letter/number correlations.

Computer codes don't ordinarily rely on dummies. But, secret messages are sent via all sorts of means. Low tech messages might easily rely on dummies. If you use such a system, beware.

Thursday, August 30, 2007

Is there a God code?

In previous posts, I discussed means of assessing whether a transmission from outer space implied intelligence. The intelligent design theorists use as an example a transmission from space that follows a pattern of the prime numbers in consecutive order. Such a transmission would be immediately recognizable as a communication, or sign of intelligence. Similarly, the ID backers say, a proto-cell is astronomically unlikely to have fallen together spontaneously.

I have argued that we can't necessarily accept that a prime number transmission implies intelligence. We first have to know about primes. But anyway, it occurs to me that cryptographers and communications engineers would have little problem with such a challenge -- if by intelligence we mean something to which humans can relate.

As is well known, every English letter occurs with a specific frequency. Hence each pair in the alphabet A (each element of AXA) also has a specific ratio. This also holds for words and for symbols that are part of some language L in general. That is, if a symbol is used in some form of communication of ideas, it bears a stable relationship with every other symbol in L.

So if a set of discrete pulses arrived at some bandwidth, we might use scatter diagrams to check each possible pair of frequencies at time interval t and see whether, for pair (x,y) a football shape emerges, and to check the level of correlation. Of course, if the correlation is too weak, either the sample is insufficient or the pair (x,y) cannot be construed as a grammatical structure in L. If the correlation is too linear, then we would have to consider mechanical rather than symbolistic cause, as with pulsars.

Noise would yield a group of scatter diagrams most of which lack a football shape. Those with the shape would be checked more closely to see whether a language is implied.

This technique should work in most cases, though I am not sure whether it would work in the specific case of a prime number sequence. More thought necessary. But, the argument may still proceed.

The question arises: can we discover such a symbolistic code at the level of the origin of life? We are not necessarily talking about the DNA or RNA codes, but about whether the makeup of the first cells might indicate an intelligence.

How might we determine whether some system has been designed by an intelligence rather than being a consequence of the "hidden hand" or regulator of chaotic processes? We might model the system's workings with a blueprint, perhaps a Boolean circuit whereby the symbols represent logic gates. If such a circuit C's symbols are all paired (CXC) and scatter plots are done, we would check for the telltale football shapes.

If they showed, we would be fairly sure that some intelligence was "speaking" through the blueprint. But a lack of such evidence would not rule out intelligent design, of course.

A study might be done comparing Boolean circuits of artificial and inorganic natural systems to see whether there are significant differences. This information might be used for comparison with organic or near-organic systems. If the symbol language of an organic system was fairly close to that of an artificial system yet different from an inorganic system, that would be big news. Still, lack of such evidence would not disprove intelligent design.

So the point here is that the odds against a fluke outlier may be extraordinarily high but that isn't necessarily analagous to an improbable signal from space. In the latter case, we can apply statistical methods to determine the likelihood of non-random activity associated with communication versus the random pulses associated with noise.

What statistical methods can be used to check for non-randomness in the "signal" of early life forms? I gave one method, though I do not think a positive result is likely.

Yet, we cannot foreclose the possibility that a strong statistical approach might yield surprising results.

Kryptograff