The Science Book

FUNDAMENTAL BUILDING BLOCKS 289

Colossus, the world’s first electronic programmable computer, was made in 1943 to crack codes at Bletchley Park in England. Michie trained staff to use the computer.

Next, the opponent positioned
their first X. For the second turn
of MENACE, the matchbox was
selected that corresponded to
the positions of the X and O
on the grid at this time. Again the
matchbox was opened, the tray
shaken and tilted, and the color
of the randomly selected bead
determined the position of
MENACE’s second O. The
opponent placed their second X.
And so on, recording MENACE’s
sequence of beads and so moves.

Win, lose, draw
Eventually there came a result.
If MENACE won, it received
reinforcement or a “reward.”
The removed beads showed the
sequence of winning moves. Each
of these beads was put back in its
box, identified by the code number
and slightly open tray. The tray
also received three extra “bonus”
beads of the same color. As a
consequence, in a future game, if
the same permutation of Os and
Xs occurred on the grid, this
matchbox would come into play
again—and it had more of the
beads that previously led to a win.

The chances of choosing that bead, and so the same move and another possible win, were increased. If MENACE lost it was “punished” by not receiving back the removed beads, which represented the losing sequence of moves. But this was still positive. In future games, if the same permutation of Xs and Os cropped up, the beads designating the same move as the previous time were either fewer in number or absent, thereby lessening the chance of another loss.

See also: Alan Turing 252–53

For a draw, each bead from that game was replaced in its relevant box, along with a small reward, one bonus bead of the same color. This increased the chances of that bead being selected if the same permutation came around again, but not as much as the win with three bonus beads. Michie’s goal was that MENACE would “learn from experience.” For given permutations of Os and Xs, when a certain sequence of moves had been successful, it should gradually become more likely, while moves that led to losses would become less likely. It should progress by trial and error, adapt with experience, and with more games, become more successful.

Controlling variables Michie considered potential problems. What if the selected bead from a tray decreed that MENACE’s O should be placed on a square already occupied by an O or X? Michie accounted for this by ensuring that each matchbox contained only beads corresponding to empty squares for its particular permutation. So the ❯❯

Each of the 304 matchboxes in MENACE represented a possible state of the board. The beads inside the boxes represented each possible move for that state. The bead at the bottom of the “V” determined the move. As games went on, winning beads were reinforced and losing ones removed, allowing MENACE to learn from its experience.

State of play

Bead indicating move

The Science Book

Get our desktop app

Company

Features

Documentation

Resources