Working with Collections
We'll be looking for constructs such as the following code snippet:
<Placemark><Point>
<coordinates>-76.33029518659048,37.54901619777347,0</coordinates>
</Point></Placemark>
The file will have a number of
coordinate structure within it. This is typical of Keyhole Markup Language (KML)
files that contain geographic information.
Parsing an XML file can be approached at two levels of abstraction. At the lower
level, we need to locate the various tags, attribute values, and content within the
XML file. At a higher level, we want to make useful objects out of the text and
attribute values.
The lower-level processing can be approached in the following way:
import xml.etree.ElementTree as XML
def row_iter_kml(file_obj):
ns_map= {
"ns0": "http://www.opengis.net/kml/2.2",
"ns1": "http://www.google.com/kml/ext/2.2"}
doc= XML.parse(file_obj)
return (comma_split(coordinates.text)
for coordinates in
doc.findall("./ns0:Document/ns0:Folder/ns0:Placemark/
ns0:Point/ns0:coordinates", ns_map))
This function requires a file that was already opened, usually via a with statement.
However, it can also be any of the file-like objects that the XML parser can handle.
The function includes a simple static dict object, ns_map, that provides the namespace
mapping information for the XML tags we'll be searching. This dictionary will be used
by the XML ElementTree.findall() method.
The essence of the parsing is a generator function that uses the sequence of tags
located by doc.findall(). This sequence of tags is then processed by a comma_
split() function to tease the text value into its comma-separated components.
The comma_split() function is the functional version of the split() method
of a string, which is as follows:
def comma_split(text):
return text.split(",")
We've used the functional wrapper to emphasize a slightly more uniform syntax.