Chapter 6
A common operation that can be approached either as a stateful map or as a
materialized, sorted object is computing the mode of a set of data values. When we
look at our trip data, the variables are all continuous. To compute a mode, we'll need
to quantize the distances covered. This is also called binning: we'll group the data
into different bins. Binning is common in data visualization applications, also. In this
case, we'll use 5 nautical miles as the size of each bin.
The quantized distances can be produced with a generator expression:
quantized= (5*(dist//5) for start,stop,dist in trip)
This will divide each distance by 5 – discarding any fractions – and then multiply by
5 to compute a number that represents the distance rounded down to the nearest 5
nautical miles.
Building a mapping with Counter
A mapping like the collections.Counter method is a great optimization for doing
reductions that create counts (or totals) grouped by some value in the collection.
A more typical functional programming solution to grouping data is to sort the
original collection, and then use a recursive loop to identify when each group begins.
This involves materializing the raw data, performing a On( logn) sort, and then
doing a reduction to get the sums or counts for each key.
We'll use the following generator to create an simple sequence of distances
transformed into bins:
quantized= (5*(dist//5) for start,stop,dist in trip)
We divided each distance by 5 using truncated integer division, and then multiplied
by 5 to create a value that's rounded down to the nearest 5 miles.
The following expression creates a mapping from distance to frequency:
from collections import Counter
Counter(quantized)
This is a stateful object, that was created by – technically – imperative object-oriented
programming. Since it looks like a function, however, it seems a good fit for a design
based on functional programming ideas.