Hacking Secret Ciphers with Python

(Ann) #1

168 http://inventwithpython.com/hacking


Email questions to the author: [email protected]


typed out a text file full of nearly all English words. These text files are called dictionary files.
So we just need to write a function that checks if the words in the string exist somewhere in that
file.


Remember, a dictionary file is a text file that contains a large list of English words. A dictionary
value is a Python value that has key-value pairs.


Not every word will exist in our “dictionary file”. Maybe the dictionary file is incomplete and
doesn’t have the word, say, “aardvark”. There are also perfectly good decryptions that might have
non-English words in them, such as “RX- 6 86” in our above English sentence. (Or maybe the
plaintext is in a different language besides English. But we’ll just assume it is in English for
now.)


And garbage text might just happen to have an English word or two in it by coincidence. For
example, it turns out the word “augur” means a person who tries to predict the future by studying
the way birds are flying. Seriously.


So our function will not be foolproof. But if most of the words in the string argument are English
words, it is a good bet to say that the string is English text. It is a very low probability that a
ciphertext will decrypt to English if decrypted with the wrong key.


The dictionary text file will have one word per line in uppercase. It will look like this:


AARHUS
AARON
ABABA
ABACK
ABAFT
ABANDON
ABANDONED
ABANDONING
ABANDONMENT
ABANDONS


...and so on. You can download this entire file (which has over 45 ,000 words) from
http://invpy.com/dictionary.txt.


Our isEnglish() function will have to split up a decrypted string into words, check if each
word is in a file full of thousands of English words, and if a certain amount of the words are
English words, then we will say that the text is in English. And if the text is in English, then
there’s a good bet that we have decrypted the ciphertext with the correct key.


And that is how the computer can understand if a string is English or if it is gibberish.

Free download pdf