The Turing Guide

(nextflipdebug5) #1

420 | 38 BANBURISmUS REVISITED


In fact the 1 in 17 repeat rate is puzzlingly low. From a frequency count in Wikipedia we can
derive a repeat rate of 1 in 13 for everyday German language, and the language of German naval
signals might have been expected to be more stereotyped, so giving an even higher repeat rate.
Alexander’s history jumps over the derivation of ‘12 times as likely’ by ‘it can be shown that’.
In fact, it depends on a composite Bayes factor. The hypothesis to be tested is that the align-
ment is true. A single repeat has a probability of 1/17 when the alignment is true (because two
stretches of German naval language are being compared) and of 1/26 when the alignment is
false, so there is a Bayes factor of (1/17) ÷ (1/26), or 1.53, in favour of truth for each repeat. A
single no-repeat has a probability of 16/17 when the alignment is true, and of 25/26 when the
alignment is false, so there is a Bayes factor of (16/17) ÷ (25/26), or 0.979, in favour of truth for
each no-repeat. Such a factor, just below 1, means that each no-repeat slightly reduces the odds
on the hypothesis that the alignment is true, diminishing its credibility and making it slightly
more likely that the alignment is false.
Alexander’s example has 7 repeats and 25 no-repeats in a stretch of 32 letters, and the Bayes
factors for the successive events have to be combined to give a composite factor. Supposing (for
the moment) that the occurrences of consecutive letters are independent of each other—such
a supposition is patently false, and persevering with it regardless is called ‘naive Bayes’ as it
often scarcely affects the conclusion—the individual factors can simply be multiplied and the
resulting composite factor in favour of truth is 1.53^7 × 0.979^25 = 11.6. This rounds up to 12,
confirming the phrase ‘about 12 times as likely’ in the quotation from Alexander.
After deriving ‘about 12 times as likely’, Alexander went on to say:


In the method of units and scoring used by us (logarithmic scoring) it would be a score of +22.


The derivation of this figure can also be reconstructed. Much ingenuity went into manipulating
the factors so that testing could be done faster and by less skilled staff. First the factors were
replaced by their logarithms, so that addition could replace multiplication, and were called
‘scores’. The above Bayes factor of 1.53 for a repeat has logarithm 0.1847. Since bigger numbers
are easier to handle, Hut 8 moved to scoring units of one-tenth the size, giving a score of 1.85 for
one repeat. Later they halved this unit for greater facility, so one repeat then scored 3.7 and the
seven in the example added up to 25.9. In practice Hut 8 may have rated speed over precision
and rounded these numbers.
For a no-repeat the Bayes factor of 0.979 has logarithm –0.0092, or (on multiplying by 20 for
the two stages above) –0.184. This is tiny in itself, but there are many no-repeats. For the 25 in
the example, the score is –4.6. This diminishes the 25.9 to 21.3. This is very close to ‘a score of
+22’, as Alexander wrote. Perhaps the small difference arises from different rounding at some
point in the calculations.
As here, Alexander was always on the lookout for ways of streamlining procedures. He
wrote:^8


We found that by splitting up a job into as many separate parts as possible and by having a really
adequate set of scoring tables . . . the whole process [was] reduced to a matter of looking things
up in a number of different tables and the only mathematical operations involved were addition
and subtraction.


The histories by Alexander and Mahon do not mention prior odds, nor a threshold score that
had to be reached before an alignment was adopted as probably correct. In practice, scores
were perhaps seen as one component rather than a sole decider, and an experienced Banburist

Free download pdf