Hacking Secret Ciphers with Python

(Ann) #1

178 http://inventwithpython.com/hacking


Email questions to the author: [email protected]


First we must create a list of individual word strings from the string in message. Line 25 will
convert it to uppercase letters. Then line 26 will remove the non-letter characters from the string,
such as numbers and punctuation, by calling removeNonLetters(). (We will see how this
function works later.) Finally, the split() method on line 27 will split up the string into
individual words that are stored in a variable named possibleWords.


So if the string 'Hello there. How are you?' was passed when
getEnglishCount() was called, the value stored in possibleWords after lines 25 to 27
execute would be ['HELLO', 'THERE', 'HOW', 'ARE', 'YOU'].


detectEnglish.py


  1. if possibleWords == []:

  2. return 0.0 # no words at all, so return 0.0


If the string in message was something like '12345', all of these non-letter characters would
have been taken out of the string returned from removeNonLetters(). The call to
removeNonLetters() would return the blank string, and when split() is called on the
blank string, it will return an empty list.


Line 29 does a special check for this case, and returns 0.0. This is done to avoid a “divide-by-
zero” error (which is explained later on).


detectEnglish.py


  1. matches = 0

  2. for word in possibleWords:

  3. if word in ENGLISH_WORDS:

  4. matches += 1


The float value that is returned from getEnglishCount() ranges between 0.0 and 1.0. To
produce this number, we will divide the number of the words in possibleWords that are
recognized as English by the total number of words in possibleWords.


The first part of this is to count the number of recognized English words in possibleWords,
which is done on lines 32 to 35. The matches variable starts off as 0. The for loop on line 33
will loop over each of the words in possibleWords, and checks if the word exists in the
ENGLISH_WORDS dictionary. If it does, the value in matches is incremented on line 35.


Once the for loop has completed, the number of English words is stored in the matches
variable. Note that technically this is only the number of words that are recognized as English
because they existed in our dictionary text file. As far as the program is concerned, if the word
exists in dictionary.txt, then it is a real English word. And if it doesn’t exist in the dictionary file,

Free download pdf