Recursions and Reductions
Grouping or partitioning data by key values
There are no limits to the kinds of reductions we might want to apply to grouped
data. We might have data with a number of independent and dependent variables.
We can consider partitioning the data by an independent variable and computing
summaries like maximum, minimum, average, and standard deviation of the values
in each partition.
The essential trick to doing more sophisticated reductions is to collect all of the data
values into each group. The Counter() function merely collects counts of identical
items. We want to create sequences of the original items based on a key value.
Looked at in a more general way, each 5-mile bin will contain the entire collection of
legs of that distance, not merely a count of the legs. We can consider the partitioning
as a recursion or as a stateful application of defaultdict(list) object. We'll look at
the recursive definition of a groupby() function, since it's easy to design.
Clearly, the groupby(C, key) method for an empty collection, C, is the empty
dictionary, dict(). Or, more usefully, the empty defaultdict(list) object.
For a non-empty collection, we need to work with item C[0], the head, and
recursively process sequence C[1:], the tail. We can use head, *tail = C
command to do this parsing of the collection, as follows:
C= [1,2,3,4,5]
head, *tail= C
head
1
tail
[2, 3, 4, 5]
We need to do the dict[key(head)].append(head) method to include the head
element in the resulting dictionary. And then we need to do the groupby(tail,key)
method to process the remaining elements.
We can create a function as follows:
def group_by(key, data):
def group_into(key, collection, dictionary):
if len(collection) == 0:
return dictionary
head, *tail= collection
dictionary[key(head)].append(head)
return group_into(key, tail, dictionary)
return group_into(key, data, defaultdict(list))