Functional Python Programming

(Wang) #1
Chapter 6

The internal group() function steps through the sorted sequence of data items. If
a given item has already been seen – it matches the value in previous – then the
counter can be incremented. If a given item does not match the previous value and
the previous value is not-None, then we've had a change in value; we can emit the
previous value and the count, and begin a new accumulation of counts for the new
value. The third condition only applies once: if the previous value has never been set,
then this is the first value, and we should save it.


The final line of the function creates a dictionary from the grouped items. This
dictionary will be similar to a Counter dictionary. The primary difference is that a
Counter() function will have a most_common() method function which a default
dictionary lacks.


The elif previous is None method case is an irksome overhead. Getting
rid of this elif clause (and seeing a slight performance improvement) isn't
terribly difficult.


To remove the extra elif clause, we need to use a slightly more elaborate
initialization in the internal group() function:


def group(data):


sorted_data= iter(sorted(data))


previous, count = next(sorted_data), 1


for d in sorted_data:


if d == previous:


count += 1


elif previous is not None: # and d != previous


yield previous, count


previous, count = d, 1


else:


raise Exception("Bad bad design problem.")


yield previous, count


This picks the first item out of the set of data to initialize the previous variable.
The remaining items are then processed through the loop. This design shows a loose
parallel with recursive designs where we initialize the recursion with the first item,
and each recursive call provides either a next item or None to indicate that no items
are left to process.


We can also do this with itertools.groupby(). We'll look at this function closely in
Chapter 8, The Itertools Module.

Free download pdf