Chapter 20 – Frequency Analysis 307
frequency taken from http://en.wikipedia.org/wiki/Letter_frequency
- englishLetterFreq = {'E': 12.70, 'T': 9.06, 'A': 8.17, 'O': 7.51, 'I':
6.97, 'N': 6.75, 'S': 6.33, 'H': 6.09, 'R': 5.99, 'D': 4.25, 'L': 4.03, 'C':
2.78, 'U': 2.76, 'M': 2.41, 'W': 2.36, 'F': 2.23, 'G': 2.02, 'Y': 1.97, 'P':
1.93, 'B': 1.29, 'V': 0.98, 'K': 0.77, 'J': 0.15, 'X': 0.15, 'Q': 0.10, 'Z':
0.07}
The englishLetterFreq dictionary will contain strings of the letters of the alphabet as keys
and a float for their percentage frequency as the value. (These values come from the Wikipedia
article for letter frequency: https://en.wikipedia.org/wiki/Letter_frequency) The
englishLetterFreq value isn’t actually used by our program. It is simply here for your
future reference in case you write a program that needs it.
The Most Common Letters, “ETAOIN”
freqAnalysis.py
8. ETAOIN = 'ETAOINSHRDLCUMWFGYPBVKJXQZ'
We will create a variable named ETAOIN on line 8 which will have the 26 letters of the alphabet
in order of most frequent to least frequent. The word ETAOIN is a handy way to remember the
six most common letters in English. Of course, this ordering isn’t always going to be perfect. You
could easily find a book that has a set of letter frequencies where Z is used more often than Q, for
example. Gadsby by Ernest Vicent Wright is a novel that never uses the letter E, which gives it a
very odd set of letter frequencies. But in most cases, the “ETAOIN order” will be accurate.
freqAnalysis.py
9. LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
Our module will also need a string of all the uppercase letters of the alphabet for a few different
functions, so we set the LETTERS constant variable on line 9.
The Program’s getLettersCount() Function
freqAnalysis.py
13. def getLetterCount(message):
14. # Returns a dictionary with keys of single letters and values of the
15. # count of how many times they appear in the message parameter.
16. letterCount = {'A': 0, 'B': 0, 'C': 0, 'D': 0, 'E': 0, 'F': 0, 'G': 0,
'H': 0, 'I': 0, 'J': 0, 'K': 0, 'L': 0, 'M': 0, 'N': 0, 'O': 0, 'P': 0, 'Q': 0,
'R': 0, 'S': 0, 'T': 0, 'U': 0, 'V': 0, 'W': 0, 'X': 0, 'Y': 0, 'Z': 0}