The Functools Module
If we use the clean_sum(comma_fix_squared, d) method as part of computing
a standard deviation, we'll do the comma-fixing operation twice on the data:
once to compute the sum and once to compute the sum of squares. This is a poor
design; caching the results with an lru_cache decorator can help. Materializing the
sanitized intermediate values as a temporary tuple object is probably better.
Using groupby() and reduce()
A common requirement is to summarize data after partitioning it into groups.
We can use a defaultdict(list) method to partition data. We can then analyze
each partition separately. In Chapter 4, Working with Collections, we looked at some
ways to group and partition. In Chapter 8, The Itertools Module, we looked at others.
Following is some sample data that we need to analyze:
data = [('4', 6.1), ('1', 4.0), ('2', 8.3), ('2', 6.5),
... ('1', 4.6), ('2', 6.8), ('3', 9.3), ('2', 7.8), ('2', 9.2),
... ('4', 5.6), ('3', 10.5), ('1', 5.8), ('4', 3.8), ('3', 8.1),
... ('3', 8.0), ('1', 6.9), ('3', 6.9), ('4', 6.2), ('1', 5.4),
... ('4', 5.8)]
We've got a sequence of raw data values with a key and a measurement for each key.
One way to produce usable groups from this data is to build a dictionary that maps a
key to a list of members in this group as follows:
from collections import defaultdict
def partition(iterable, key=lambda x:x):
"""Sort not required."""
pd = defaultdict(list)
for row in iterable:
pd[key(row)].append(row)
for k in sorted(pd):
yield k, iter(pd[k])
This will separate each item in the iterable into individual groups. The key()
function is used to extract a key value from each item. This key is used to append
each item to a list in the pd dictionary. The resulting value of this function matches
the results of the itertools.groupby() function: it's an iterable sequence of the
(group key, iterator) pairs.