The Itertools Module
Partitioning an iterator with groupby()
We can use the groupby() function to partition an iterator into smaller iterators.
This works by evaluating the given key() function for each item in the given
iterable. If the key value matches the previous item's key, the two items are part of
the same partition. If the key does not match the previous item's key, the previous
partition is ended and a new partition is started.
The output from the groupby() function is a sequence of two tuples. Each tuple
has the group's key value and an iterable over the items in the group. Each group's
iterator can be preserved as a tuple or processed to reduce it to some summary value.
Because of the way the group iterators are created, they can't be preserved.
In the Running totals with accumulate() section, earlier in the chapter, we showed how
to compute quartile values for an input sequence.
Given the trip variable with the raw data and the quartile variable with the quartile
assignments, we can group the data using the following commands:
group_iter= groupby(zip(quartile, trip), key=lambda q_raw:
q_raw[0])
for group_key, group_iter in group_iter:
print(group_key, tuple(group_iter))
This will start by zipping the quartile numbers with the raw trip data, iterating over
two tuples. The groupby() function will use the given lambda variable to group by
the quartile number. We used a for loop to examine the results of the groupby()
function. This shows how we get a group key value and an iterator over members
of the group.
The input to the groupby() function must be sorted by the key values. This will
assure that all of the items in a group will be adjacent.
Note that we can also create groups using the defaultdict(list) method,
as follows:
def groupby_2(iterable, key):
groups = defaultdict(list)
for item in iterable:
groups[key(item)].append(item)
for g in groups:
yield iter(groups[g])