Hacking Secret Ciphers with Python

Chapter 12 – Detecting English Programmatically 167

A message encrypted with the transposition cipher can have thousands of possible keys. Your
computer can still easily brute-force this many keys, but you would then have to look through
thousands of decryptions to find the one correct plaintext. This is a big problem for the brute-
force method of cracking the transposition cipher.

When the computer decrypts a message with the wrong key, the resulting plaintext is garbage
text. We need to program the computer to be able to recognize if the plaintext is garbage text or
English text. That way, if the computer decrypts with the wrong key, it knows to go on and try the
next possible key. And when the computer tries a key that decrypts to English text, it can stop and
bring that key to the attention of the cryptanalyst. Now the cryptanalyst won’t have to look
through thousands of incorrect decryptions.

How Can a Computer Understand English?

It can’t. At least, not in the way that human beings like you or I understand English. Computers
don’t really understand math, chess, or lethal military androids either, any more than a clock
understands lunchtime. Computers just execute instructions one after another. But these
instructions can mimic very complicated behaviors that solve math problems, win at chess, or
hunt down the future leaders of the human resistance.

Ideally, what we need is a Python function (let’s call it isEnglish()) that has a string passed
to it and then returns True if the string is English text and False if it’s random gibberish. Let’s
take a look at some English text and some garbage text and try to see what patterns the two have:

Robots are your friends. Except for RX- 6 86. She will try to eat you.

ai-pey e. xrx ne augur iirl 6 Rtiyt fhubE6d hrSei t8..ow eo.telyoosEs t

One thing we can notice is that the English text is made up of words that you could find in a
dictionary, but the garbage text is made up of words that you won’t. Splitting up the string into
individual words is easy. There is already a Python string method named split() that will do
this for us (this method will be explained later). The split() method just sees when each word
begins or ends by looking for the space characters. Once we have the individual words, we can
test to see if each word is a word in the dictionary with code like this:

if word == 'aardvark' or word == 'abacus' or word == 'abandon' or word ==
'abandoned' or word == 'abbreviate' or word == 'abbreviation' or word ==
'abdomen' or ...

We can write code like that, but we probably shouldn’t. The computer won’t mind running
through all this code, but you wouldn’t want to type it all out. Besides, somebody else has already

Hacking Secret Ciphers with Python

How Can a Computer Understand English?

Get our desktop app

Company

Features

Documentation

Resources