Python Programming: An Introduction to Computer Science

11.6. NON-SEQUENTIALCOLLECTIONS 197

return cmp(w1,w2) else: return 1

Thisfunctionacceptstwo parameters,eachofwhichis a tupleoftwo values.NoticeI have takenadvan-
tageofPython’s automatictupleunpackingandwritteneachparameterasa pairofvariables.Take a lookat
thedecisionstructure.If thecountinthefirstitemis greaterthanthecountintheseconditem,thenthefirst
itemshouldprecedethesecondinthesortedlist(sincewewantthemostfrequentwordsatthefrontofthe
list)andthefunctionreturns-1.If thecountsareequal,thenweletPythoncomparethetwo wordstringswith
cmp. Thisensuresthatgroupsofwordswiththesamefrequency willappearinalphabeticalorderrelative to
eachother. Finally, theelsehandlesthecasewhenthesecondcountis larger;thefunctionreturnsa 1 to
indicatethatthefirstparametershouldfollow thesecond.
Withthiscomparisonfunction,it is now a simplemattertosortouritemsintothecorrectorder.

items.sort(compareItems)

NoticehereI have usedjustthenameofthefunctionastheparametertosort. Whena functionnameis
usedlike this(withoutany trailingparentheses),it tellsPythonthatthefunctionobjectitselfis beingreferred
to.Asyouknow, a functionnamefollowedbyparenthesestellsPythontocallthefunction.Inthiscase,we
arenotcallingthefunction,butrathersendingthefunctionobjecttothesortmethodtoletit dothecalling.
Now thatouritemsaresortedinorderfrommosttoleastfrequent,wearereadytoprinta reportofthen
mostfrequentwords.Here’s a loopthatdoesthetrick:

for i in range(n): print "%-10s%5d"% items[i]

Noticeespeciallytheformattedprintstatement.It printsa string,left-justifiedintenspacesfollowedbyan
intright-justifiedinfive spaces.Normally, wewouldsupplya pairofvaluestofillintheslots(e.g.,print
"%-10s%5d" % (word,count)). Inthiscase,however,items[i]isa pair, soPythoncanextract
thetwo valuesthatit needs.
Thataboutdoesit.Hereis thecompleteprogram:

wordfreq.py

import string

def compareItems((w1,c1), (w2,c2)):
if c1 > c2:
return - 1
elif c1 == c2:
return cmp(w1,w2)
else:
return 1

def main():
print "This programanalyzes word frequency in a file"
print "and printsa report on the n most frequent words.\n"

# get the sequenceof words from the file fname = raw_input("Fileto analyze: ") text = open(fname,’r’).read() text = string.lower(text) for ch in ’!"#$%&()*+,-./:;<=>?@[\\]ˆ_‘{|} ̃’: text = string.replace(text, ch, ’ ’) words = string.split(text)

# construct a dictionaryof word counts

Python Programming: An Introduction to Computer Science

11.6. NON-SEQUENTIALCOLLECTIONS 197

wordfreq.py

Get our desktop app

Company

Features

Documentation

Resources