Recursions and Reductions
return (comma_split(coordinates.text)
for coordinates in
doc.findall("./ns0:Document/ns0:Folder/ns0:Placemark
/ns0:Point/ns0:coordinates", ns_map)
The bulk of the row_iter_kml() function is the XML parsing that allows us to use
the doc.findall() function to iterate through the
document. We've used a function named comma_split() to parse the text of this tag
into a three tuple of values.
This is focused on working with the normalized XML structure. The document
mostly fits the database designer's definitions of First Normal Form, that is, each
attribute is atomic and only a single value. Each row in the XML data had the same
columns with data of a consistent type. The data values weren't properly atomic; we
had to split the points on a "," to separate longitude, latitude, and altitude into atomic
string values.
A large volume of data – xml tags, attributes, and other punctuation – was reduced
to a somewhat smaller volume including just floating-point latitude and longitude
values. For this reason, we can think of parsers as a kind of reduction.
We'll need a higher-level set of conversions to map the tuples of text into
floating-point numbers. Also, we'd like to discard altitude, and reorder longitude
and latitude. This will produce the application-specific tuple we need. We can use
functions as follows for this conversion:
def pick_lat_lon(lon, lat, alt):
return lat, lon
def float_lat_lon(row_iter):
return (tuple(map(float, pick_lat_lon(*row)))
for row in row_iter)
The essential tool is the float_lat_lon() function. This is a higher-order function
which returns a generator expression. The generator uses map() function to apply
the float() function conversion to the results of pick_lat_lon() class. We've used
the *row parameter to assign each member of the row tuple to a different parameter
of the pick_lat_lon() function. This function then returns a tuple of the selected
items in the required order.
We can use this parser as follows:
with urllib.request.urlopen("file:./Winter%202012-2013.kml") as
source:
trip = tuple(float_lat_lon(row_iter_kml(source)))