Chapter 8
We created a defaultdict class with a list object as the value associated with
each key. Each item will have the given key() function applied to create a key value.
The item is appended to the list in the defaultdict class with the given key.
Once all of the items are partitioned, we can then return each partition as an iterator
over the items that share a common key. This is similar to the groupby() function
because the input iterator to this function isn't necessarily sorted in precisely the
same order; it's possible that the groups might have the same members, but the
order might differ.
Merging iterables with zip_longest() and zip()
We saw the zip() function in Chapter 4, Working with Collections. The zip_longest()
function differs from the zip() function in an important way: where the zip()
function stops at the end of the shortest iterable, the zip_longest() function pads
short iterables and stops at the end of the longest iterable.
The fillvalue keyword parameter allows filling with a value other than the default
value, None.
For most exploratory data analysis applications, padding with a default value is
statistically difficult to justify. The Python Standard Library document shows a few
clever things which can be done with the zip_longest() function. It's difficult to
expand on these without drifting far from our focus on data analysis.
Filtering with compress()
The built-in filter() function uses a predicate to determine if an item is passed
or rejected. Instead of a function that calculates a value, we can use a second, parallel
iterable to determine which items to pass and which to reject.
We can think of the filter() function as having the following definition:
def filter(iterable, function):
i1, i2 = tee(iterable, 2)
return compress(i1, (function(x) for x in i2))
We cloned the iterable using the tee() function. (We'll look at this function in
detail later.) We evaluated the filter predicate for each value. Then we provided the
original iterable and the filter function iterable to compress, pass, and reject values.
This builds the features of the filter() function from the more primitive features
of the compress() function.