11.6. NON-SEQUENTIALCOLLECTIONS 197
return cmp(w1,w2)
else:
return 1
Thisfunctionacceptstwo parameters,eachofwhichis a tupleoftwo values.NoticeI have takenadvan-
tageofPython’s automatictupleunpackingandwritteneachparameterasa pairofvariables.Take a lookat
thedecisionstructure.If thecountinthefirstitemis greaterthanthecountintheseconditem,thenthefirst
itemshouldprecedethesecondinthesortedlist(sincewewantthemostfrequentwordsatthefrontofthe
list)andthefunctionreturns-1.If thecountsareequal,thenweletPythoncomparethetwo wordstringswith
cmp. Thisensuresthatgroupsofwordswiththesamefrequency willappearinalphabeticalorderrelative to
eachother. Finally, theelsehandlesthecasewhenthesecondcountis larger;thefunctionreturnsa 1 to
indicatethatthefirstparametershouldfollow thesecond.
Withthiscomparisonfunction,it is now a simplemattertosortouritemsintothecorrectorder.
items.sort(compareItems)
NoticehereI have usedjustthenameofthefunctionastheparametertosort. Whena functionnameis
usedlike this(withoutany trailingparentheses),it tellsPythonthatthefunctionobjectitselfis beingreferred
to.Asyouknow, a functionnamefollowedbyparenthesestellsPythontocallthefunction.Inthiscase,we
arenotcallingthefunction,butrathersendingthefunctionobjecttothesortmethodtoletit dothecalling.
Now thatouritemsaresortedinorderfrommosttoleastfrequent,wearereadytoprinta reportofthen
mostfrequentwords.Here’s a loopthatdoesthetrick:
for i in range(n):
print "%-10s%5d"% items[i]
Noticeespeciallytheformattedprintstatement.It printsa string,left-justifiedintenspacesfollowedbyan
intright-justifiedinfive spaces.Normally, wewouldsupplya pairofvaluestofillintheslots(e.g.,print
"%-10s%5d" % (word,count)). Inthiscase,however,items[i]isa pair, soPythoncanextract
thetwo valuesthatit needs.
Thataboutdoesit.Hereis thecompleteprogram:
wordfreq.py
import string
def compareItems((w1,c1), (w2,c2)):
if c1 > c2:
return - 1
elif c1 == c2:
return cmp(w1,w2)
else:
return 1
def main():
print "This programanalyzes word frequency in a file"
print "and printsa report on the n most frequent words.\n"
# get the sequenceof words from the file
fname = raw_input("Fileto analyze: ")
text = open(fname,’r’).read()
text = string.lower(text)
for ch in ’!"#$%&()*+,-./:;<=>?@[\\]ˆ_‘{|} ̃’:
text = string.replace(text, ch, ’ ’)
words = string.split(text)
# construct a dictionaryof word counts