196 CHAPTER11. DATA COLLECTIONS
counts = {}
for w in words:
try:
counts[w]= counts[w] + 1
except KeyError:
counts[w]= 1
Ourlaststepis to printa reportthatsummarizesthecontentsofcounts. Oneapproachmightbeto print
outthelistofwordsandtheirassociatedcountsinalphabeticalorder. Here’s how thatcouldbedone:
get list of wordsthat appear in document
uniqueWords = counts.keys()
put list of wordsin alphabetical order
uniqueWords.sort()
print words and associatedcounts
for w in uniqueWords:
print w, counts[w]
Fora largedocument,however, thisis unlikelytobeuseful.Therewillbefartoomany words,mostof
whichonlyappeara few times.A moreinterestinganalysisis toprintoutthecountsforthenmostfrequent
wordsinthedocument.Inordertodothat,wewillneedtocreatea listthatis sortedbycounts(mostto
fewest)andthenselectthefirstnitemsinthelist.
We canstartbygettinga listofkey-valuepairsusingtheitemsmethodfordictionaries.
items = counts.items()
Hereitemswillbea listoftuples(e.g.,[(’foo’,5),(’bar’,7), (’spam’,376), ]). If we
simplysortthislist(items.sort()) Pythonwillputthemin a standardorder. Unfortunately, whenPython
comparestuples,it ordersthembycomponents,lefttoright.Sincethefirstcomponentofeachpairis the
word,items.sort()willputthislistinalphabeticalorder, whichis notwhatwewant.
Inordertoputourpairlistintheproperorder, weneedtoinvestigatethesortingmethodforlistsa bit
morecarefully. Whenwefirstcoveredthesortmethod,I mentionedthatit cantake a comparisonfunction
asanoptionalparameter. We canusethisfeaturetotellPythonhow tosortthelistofpairs.
If nocomparisonfunctionis given,Pythonordersa listaccordingtothethebuilt-infunctioncmp. This
functionacceptstwo valuesasparametersandreturns-1,0 or1, correspondingtotherelative orderingofthe
parameters.Thus,cmp(a,b)returns-1ifaprecedesb, 0 if they arethesame,and1 ifafollowsb. Here
area few examples.
cmp(1,2)
-1
cmp("a","b")
-1
cmp(3,1)
1
cmp(3.1,3.1)
0
To sortourlistofitems,weneeda comparisonfunctionthattakestwo items(i.e.,word-countpairs)and
returnseither-1,0 or1,givingtherelative orderinwhichwewantthosetwo itemstoappearinthesorted
list.Hereis thecodefora suitablecomparisonfunction:
def compareItems((w1,c1), (w2,c2)):
if c1 > c2:
return - 1
elif c1 == c2: