Functional Python Programming

(Wang) #1
Chapter 6

This will build a tuple-of-tuple representation of each waypoint along the path in
the original KML file. It uses a low-level parser to extract rows of text data from the
original representation. It uses a high-level parser to transform the text items into
more useful tuples of floating-point values. In this case, we have not implemented
any validation.


Parsing CSV files

In Chapter 3, Functions, Iterators and Generators, we saw another example where we
parsed a CSV file that was not in a normalized form: we had to discard header rows
to make it useful. To do this, we used a simple function that extracted the header and
returned an iterator over the remaining rows.


The data looks as follows:


Anscombe's quartet


I II III IV


x y x y x y x y


10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58


8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76


...


5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89


The columns are separated by tab characters. Plus there are three rows of headers
that we can discard.


Here's another version of that CSV-based parser. We've broken it into three
functions. The first, row_iter() function, returns the iterator over the rows
in a tab-delimited file. The function looks as follows:


def row_iter_csv(source):


rdr= csv.reader(source, delimiter="\t")


return rdr


This is a simple wrapper around the CSV parsing process. When we look back at the
previous parsers for XML and plain text, this was the kind of thing that was missing
from those parsers. Producing an iterable over row tuples can be a common feature
of parsers for normalized data.


Once we have a row of tuples, we can pass rows that contain usable data and reject
rows that contain other metadata, like titles and column names. We'll introduce a
helper function that we can use to do some of the parsing, plus a filter() function
to validate a row of data.

Free download pdf