302 http://inventwithpython.com/hacking
Email questions to the author: [email protected]
the letter frequency of each subkey’s decrypted text. (This will be explained more in the next
chapter.)
Matching Letter Frequencies
By “matching the letter frequency of regular English” we could try several different algorithms.
The one used in our hacking program will simply order the letters from most frequent to least
frequent. We will calculate what we will call in this book a frequency match score for this
ordering of frequencies. To calculate the frequency match score for a string, the score starts at 0
and each time one of the letters E, T, A, O, I, N appears among the six most frequent letters of the
string, we add a point to the score. And each time one of the letters V, K, J, X, Q, or Z appears
among the six least frequent letters of the string, we add a point to the score. The frequency match
score for a string will be an integer from 0 (meaning the letter frequency of the string is
completely unlike regular English’s letter frequency) to 12 (meaning it is identical to regular
English’s letter frequency).
An Example of Calculating Frequency Match Score
For example, look at this ciphertext which was encrypted with a simple substitution cipher:
“Sy l nlx sr pyyacao l ylwj eiswi upar lulsxrj isr sxrjsxwjr, ia esmm
rwctjsxsza sj wmpramh, lxo txmarr jia aqsoaxwa sr pqaceiamnsxu, ia esmm caytra
jp famsaqa sj. Sy, px jia pjiac ilxo, ia sr pyyacao rpnajisxu eiswi lyypcor l
calrpx ypc lwjsxu sx lwwpcolxwa jp isr sxrjsxwjr, ia esmm lwwabj sj aqax px jia
rmsuijarj aqsoaxwa. Jia pcsusx py nhjir sr agbmlsxao sx jisr elh. -Facjclxo
Ctrramm”
If we count the frequency of each letter and then arrange them by order of its frequency, we end
up with this ordering: ASRXJILPWMCYOUEQNTHBFZGKVD. That is, A is the most frequent
letter, S is the 2nd most frequent letter, and so on down to the letter D, which appears the least
frequently.
The six most frequent letters in this ordering are A, S, R, X, J, and I. Only two of these letters (A
and I) appear in the ETAOIN set of letters. The six least frequent letters in the ordering are F, Z,
G, K, V, and D. Only three of these letters (Z, K, and V) appear in the VKJXQZ set of letters. So
the frequency ordering ASRXJILPWMCYOUEQNTHBFZGKVD (which comes from the above
ciphertext) has a frequency match score of 5.