Functional Python Programming

(Wang) #1

The Itertools Module


In the Reiterating a cycle with cycle() section of this chapter, we looked at data selection
using a simple generator expression. Its essence was as follows:


chooser = (x == 0 for x in cycle(range(c)))


keep= (row for pick, row in zip(chooser, some_source) if pick)


We defined a function which would produce a value 1 followed by c-1 zeroes.
This cycle would be repeated, allowing to pick only 1/c rows from the source.


We can replace the cycle(range(c)) function with the repeat(0) function to
select all rows. We can also replace it with the random.randrange(c) function to
randomize the selection of rows.


The keep expression is really just a compress(some_source, chooser) method.
If we make that change, the processing is simplified:


all = repeat(0)


subset = cycle(range(c))


randomized = random.randrange(c)


selection_rule = one of all, subset, or randomized


chooser = (x == 0 for x in selection_rule)


keep = compress(some_source, chooser)


We've defined three alternative selection rules: all, subset, and randomized.
The subset and randomized versions will pick 1/c rows from the source. The chooser
expression will build an iterable over True and False values based on one of the
selection rules. The rows to be kept are selected by applying the source iterable to the
row selection iterable.


Since all of this is non-strict, rows are not read from the source until required.
This allows us to process very large sets of data efficiently. Also, the relative
simplicity of the Python code means that we don't really need a complex
configuration file and an associated parser to make choices among the selection
rules. We have the option to use this bit of Python code as the configuration for
a larger data sampling application.


Picking subsets with islice()


In Chapter 4, Working with Collections, we looked at slice notation to select subsets
from a collection. Our example was to pair up items sliced from a list object.
The following is a simple list:


flat= ['2', '3', '5', '7', '11', '13', '17', '19', '23', '29', '31',
'37', '41', '43', '47', '53', '59', '61', '67', '71',... ]

Free download pdf