Functional Python Programming

Chapter 10

Following is the same feature done with the itertools.groupby() function:

def partition_s(iterable, key= lambda x:x):

"""Sort required"""

return groupby(iterable, key)

The important difference in the inputs to each function is that the groupby() function version requires data already sorted by the key whereas the defaultdict version doesn't require sorting. For very large sets of data, the sort can be expensive measured in both time and storage. The final sort of the keys does create an intermediate list object, but this object might not be as large as the original set of data, depending on the cardinality of the keys.

We can summarize the grouped data as follows:

mean= lambda seq: sum(seq)/len(seq)

var= lambda mean, seq: sum( (x-mean)**2/mean for x in seq)

def summarize( key_iter ):

key, item_iter= key_iter

values= tuple((v for k,v in item_iter))

μ= mean(values)

return key, μ, var(μ, values)

The results of the partition() functions will be a sequence of (key, iterator)
two tuples. We'll separate the key from the item iterator. Each item in the item
iterator is one of the original objects in the source data; these are (key, value)
pairs; we only want the values, and so we've used a simple generator expression to
separate the source keys from the values.

We can also execute the following command to pick the second item from each of the
two tuples:

map(snd, item_iter)

This requires the snd= lambda x: x[1] method.

We can use the following command to apply the summarize() function to
each partition:

partition1= partition(list(data), key=lambda x:x[0])

groups= map(summarize, partition1)

The alternative commands are as follows:

partition2= partition_s(sorted(data), key=lambda x:x[0])

groups= map(summarize, partition2)

Functional Python Programming

Get our desktop app

Company

Features

Documentation

Resources