Python Programming: An Introduction to Computer Science

11.6. NON-SEQUENTIALCOLLECTIONS 195

Atthehighestlevel,thisis justa multi-accumulatorproblem.We needa countforeachwordthatappears
inthedocument. We canusea loopthatiteratesthrougheachwordinthedocumentandaddsonetothe
appropriatecount.Theonlycatchis thatwewillneedhundredsorthousandsofaccumulators,oneforeach
uniquewordinthedocument.Thisis wherea (Python)dictionarycomesinhandy.
We willusea dictionarywherethekeysarestringsrepresentingwordsinthedocumentandthevaluesare
intsthatcountofhow many timesthewordappears.Let’s callourdictionarycounts. To updatethecount
fora particularword,w, wejustneeda lineofcodesomethinglike this:

counts[w] = counts[w]+ 1

Thissaystosetthecountassociatedwithwordwtobeonemorethanthecurrentcountforw.
Thereis onesmallcomplicationwithusinga dictionaryhere.Thefirsttimeweencountera word,it will
notyetbeincounts. Attemptingtoaccessa non-existentkey producesa run-timeKeyError. To guard
againstthis,weneeda decisioninouralgorithm.

if w is already in counts:
add one to the countfor w
else:
set count for w to 1

Thisdecisionensuresthatthefirsttimea wordis encountered,it willbeenteredintothedictionarywitha
countof1.
Onewaytoimplementthisdecisionis tousethehaskeymethodfordictionaries.

if counts.has_key(w):
else:
counts[w] = 1

Anotherapproachis touseatry-excepttocatchtheerror.

try:
except KeyError:
counts[w] = 1

Thisis a commonpatterninprogramsthatusedictionaries,andbothofthesecodingstylesareused.
Thedictionaryupdatingcodewillformtheheartofourprogram.We justneedtofillinthepartsaround
it.Thefirsttaskis tosplitourtextdocumentintoa sequenceofwords.Intheprocess,wewillalsoconvert
allthetexttolowercase(sooccurrencesof“Foo”match“foo”)andeliminatepunctuation(so“foo,” matches
“foo”).Here’s thecodetodothat:

fname = raw_input("Fileto analyze: ")

# read file as onelong string text = open(fname,’r’).read()

# convert all lettersto lower case text = string.lower(text)

# replace eachpunctuation character with a space for ch in ’!"#$%&()*+,-./:;<=>?@[\\]ˆ_‘{|} ̃’: text = string.replace(text, ch, ’ ’)

# split stringat whitespace to form a list of words words = string.split(text)

Now wecaneasilyloopthroughthewordstobuildthecountsdictionary.

Python Programming: An Introduction to Computer Science

11.6. NON-SEQUENTIALCOLLECTIONS 195

Get our desktop app

Company

Features

Documentation

Resources