The Itertools Module
We can—with a small change—use the random.randrange(c) method instead
of the cycle(c) method to achieve a randomized selection of a similar sized subset.
We can also rewrite this method to use compress(), filter(), and islice()
functions, as we'll see later in this chapter.
This design will also reformat a file from any nonstandard CSV-like format into
a standardized CSV format. As long as we define parser functions that return
consistently defined tuples and write consumer functions that write tuples to the
target files, we can do a great deal of cleansing and filtering with relatively short,
clear scripts.
Repeating a single value with repeat()
The repeat() function seems like an odd feature: it returns a single value over and
over again. It can serve as a replacement for the cycle() function. We can extend
our data subset selection function using the repeat(0) method instead of the
cycle(range(100)) method in an expression line, for example,(x==0 for x in
some_function).
We can think of the following commands:
all = repeat(0)
subset= cycle(range(100))
chooser = (x == 0 for x in either_all_or_subset)
This allows us to make a simple parameter change, which will either pick all data
or pick a subset of data.
We can embed this in nested loops to create more complex structures. Here's a
simple example:
list(tuple(repeat(i, times=i)) for i in range(10))
[(), (1,), (2, 2), (3, 3, 3), (4, 4, 4, 4), (5, 5, 5, 5, 5),
(6, 6, 6, 6, 6, 6), (7, 7, 7, 7, 7, 7, 7), (8, 8, 8, 8, 8, 8, 8, 8),
(9, 9, 9, 9, 9, 9, 9, 9, 9)]
list(sum(repeat(i, times=i)) for i in range(10))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
We created repeating sequences of numbers using the times parameter on the
repeat() function.