Functional Python Programming

(Wang) #1
Chapter 10

Following is the same feature done with the itertools.groupby() function:


def partition_s(iterable, key= lambda x:x):


"""Sort required"""


return groupby(iterable, key)


The important difference in the inputs to each function is that the groupby() function
version requires data already sorted by the key whereas the defaultdict version doesn't
require sorting. For very large sets of data, the sort can be expensive measured in both time
and storage. The final sort of the keys does create an intermediate list object, but this object
might not be as large as the original set of data, depending on the cardinality of the keys.

We can summarize the grouped data as follows:


mean= lambda seq: sum(seq)/len(seq)


var= lambda mean, seq: sum( (x-mean)**2/mean for x in seq)


def summarize( key_iter ):


key, item_iter= key_iter


values= tuple((v for k,v in item_iter))


μ= mean(values)


return key, μ, var(μ, values)


The results of the partition() functions will be a sequence of (key, iterator)
two tuples. We'll separate the key from the item iterator. Each item in the item
iterator is one of the original objects in the source data; these are (key, value)
pairs; we only want the values, and so we've used a simple generator expression to
separate the source keys from the values.


We can also execute the following command to pick the second item from each of the
two tuples:


map(snd, item_iter)


This requires the snd= lambda x: x[1] method.


We can use the following command to apply the summarize() function to
each partition:





partition1= partition(list(data), key=lambda x:x[0])








groups= map(summarize, partition1)





The alternative commands are as follows:





partition2= partition_s(sorted(data), key=lambda x:x[0])








groups= map(summarize, partition2)




Free download pdf