FUNDAMENTAL BUILDING BLOCKS 289
Colossus, the world’s first electronic
programmable computer, was made in
1943 to crack codes at Bletchley Park
in England. Michie trained staff to
use the computer.
Next, the opponent positioned
their first X. For the second turn
of MENACE, the matchbox was
selected that corresponded to
the positions of the X and O
on the grid at this time. Again the
matchbox was opened, the tray
shaken and tilted, and the color
of the randomly selected bead
determined the position of
MENACE’s second O. The
opponent placed their second X.
And so on, recording MENACE’s
sequence of beads and so moves.
Win, lose, draw
Eventually there came a result.
If MENACE won, it received
reinforcement or a “reward.”
The removed beads showed the
sequence of winning moves. Each
of these beads was put back in its
box, identified by the code number
and slightly open tray. The tray
also received three extra “bonus”
beads of the same color. As a
consequence, in a future game, if
the same permutation of Os and
Xs occurred on the grid, this
matchbox would come into play
again—and it had more of the
beads that previously led to a win.
The chances of choosing that bead,
and so the same move and another
possible win, were increased.
If MENACE lost it was
“punished” by not receiving
back the removed beads, which
represented the losing sequence
of moves. But this was still
positive. In future games, if the
same permutation of Xs and Os
cropped up, the beads designating
the same move as the previous time
were either fewer in number or
absent, thereby lessening the
chance of another loss.
See also: Alan Turing 252–53
For a draw, each bead from that
game was replaced in its relevant
box, along with a small reward,
one bonus bead of the same
color. This increased the chances
of that bead being selected if the
same permutation came around
again, but not as much as the
win with three bonus beads.
Michie’s goal was that MENACE
would “learn from experience.” For
given permutations of Os and Xs,
when a certain sequence of moves
had been successful, it should
gradually become more likely, while
moves that led to losses would
become less likely. It should
progress by trial and error, adapt
with experience, and with more
games, become more successful.
Controlling variables
Michie considered potential
problems. What if the selected
bead from a tray decreed that
MENACE’s O should be placed
on a square already occupied
by an O or X? Michie accounted
for this by ensuring that each
matchbox contained only beads
corresponding to empty squares for
its particular permutation. So the ❯❯
Each of the 304
matchboxes in MENACE
represented a possible
state of the board. The
beads inside the boxes
represented each possible
move for that state.
The bead at the bottom
of the “V” determined the
move. As games went
on, winning beads were
reinforced and losing
ones removed, allowing
MENACE to learn from
its experience.
State
of play
Bead
indicating
move