Functional Python Programming

(Wang) #1

The Itertools Module


We can use these with file parsing to skip headers or footers in the input. We use the
dropwhile() function to reject header rows and pass the remaining data. We use the
takewhile() function to pass data and reject trailer rows. We'll return to the simple
GPL file format shown in Chapter 3, Functions, Iterators, and Generators. The file has a
header that looks as follows:


GIMP Palette
Name: Crayola
Columns: 16
#

This is followed by rows that look like the following example:


255 73 108 Radical Red

We can easily locate the final line of the headers—the # line—using a parser based
on the dropwhile() function, as follows:


with open("crayola.gpl") as source:


rdr = csv.reader(source, delimiter='\t')


rows = dropwhile(lambda row: row[0] != '#', rdr)


We created a CSV reader to parse the lines based on tab characters. This will neatly
separate the color three tuple from the name. The three tuple will need further
parsing. This will produce an iterator that starts with the # line and continues with
the rest of the file.


We can use the islice() function to discard the first item of an iterable. We can then
parse the color details as follows:


color_rows = islice(rows, 1, None)


colors = ((color.split(), name) for color, name in color_rows)


print(list(colors))


The islice(rows, 1, None) expression is similar to asking for a rows[1:] slice:
the first item is quietly discarded. Once the last of the heading rows have been
discarded, we can parse the color tuples and return more useful color objects.


For this particular file, we can also use the number of columns located by the CSV
reader function. We can use the dropwhile(lambda row: len(row) == 1, rdr)
method to discard header rows. This doesn't always work out well in general.
Locating the last line of the headers is often easier than trying to locate some general
feature that distinguishes header (or trailer) lines from the meaningful file content.

Free download pdf