SImPSON | 419
We now apply the same reasoning to a bulb’s passing the test. Out of a batch of 100 a true
tester will pass 80. For a faulty tester to pass a bulb, two things are necessary: that the bulb is
one of the 80 good ones and that the tester is on one of its three out of four good behaviours.
On average, 60 (i.e. 3/4 × 80) of the 100 will comply. It is thus 3/4 times as likely (60 against 80)
that a single bulb will pass the test with a faulty tester as with a true one. This means a Bayes
factor of 3/4 in favour of the hypothesis that my tester is faulty (the factor being below 1 makes
the hypothesis less likely).
The prior odds for my tester being faulty (as explained by its supplier) are 99:1 against, or
1:99 on, because 1% of the testers in the suspect batch were faulty. The evidence that aroused
my suspicion—five consecutive blown bulbs each with a Bayes factor of 2—applies a composite
factor of 2^5 = 32 to the prior odds of 1:99 on the tester being faulty. So the posterior odds are
about 1:3 that the tester is faulty (or 3:1 that it is good). This is inconclusive. Perhaps those five
blown bulbs were a freak.
I now test another ten bulbs and find that four are blown and six are accepted. The further
composite factor resulting from this, taking the respective Bayes factors, is 2^4 × (3/4)^6 = 2.85,
and this moves the posterior odds from about 1:3 that the tester is faulty to just under evens.
This is still inconclusive. Another 30 tests deliver 12 bulbs blown and 18 accepted, bringing a
further composite Bayes factor of 2.85^3 = 23.15. The posterior odds now move from just under
evens to 21:1 on the hypothesis that the tester is faulty. This passes beyond what statisticians
commonly accept as the first-level criterion for evidence establishing truth: 95% confidence, or
odds of 19:1 on. The tester must be discarded.
weighing the evidence
To return to Banburismus and to how much evidence was ‘enough’. Alexander’s history sets up
an example of two messages set in correct alignment, which provide an overlap of 32 letters
enciphered at the same machine positions. In 7 of those 32, the same letter occurs in both mes-
sages: 7 repeats. He continues:^5
It is fairly obvious that the repeat rate [of letters] for plain language is higher than that of ran-
dom cypher material . . . Random 1/26 . . . German Naval 1/17 . . . This provides us with a
criterion for testing whether or not a given [alignment^6 ] is correct, viz. the number of repeats
or ‘score’ obtained by writing out the messages concerned at that [alignment]. Going back to
our example . . . the ‘score’ is 7 repeats . . . in a stretch of 32 letters. To determine the merits of
this score it is necessary to know whether it is more likely that such a result would arise from a
true [alignment] or by chance—and how much more likely one way is than the other. It can be
shown that it is about 12 times as likely that this particular score would arise in a true as in a false
position.
Neither Alexander’s history nor Mahon’s (see Chapter 13) explains the 1 in 17 repeat rate
used for German naval language. Earlier workers had used 1 in 20 in 1920 and 1 in 16.5 in 1940.
Alexander records that the capture of the June and July 1941 keys gave them an opportunity to
overhaul their methods:^7
A fresh statistical investigation into the frequency of repeats between messages was made and
our old figures revised.