'0-596-00797-3': 'Python Cookbook, 2nd Edition',
'0-596-10046-9': 'Python in a Nutshell, 2nd Edition',
'0-596-15806-8': 'Learning Python, 4th Edition',
'0-596-15808-4': 'Python Pocket Reference, 4th Edition',
'0-596-15810-6': 'Programming Python, 4th Edition'}
ElementTree parsing
As a fourth option, the popular ElementTree package is a standard library tool for both
parsing and generating XML. As a parser, it’s essentially a more Pythonic type of
DOM—it parses documents into a tree of objects again, but the API for navigating the
tree is more lightweight, because it’s Python-specific.
ElementTree provides easy-to-use tools for parsing, changing, and generating XML
documents. For both parsing and generating, it represents documents as a tree of
Python “element” objects. Each element in the tree has a tag name, attribute dictionary,
text value, and sequence of child elements. The element object produced by a parse
can be navigating with normal Python loops for a known structures, and with recursion
where arbitrary nesting is possible.
The ElementTree system began its life as a third-party extension, but it was largely
incorporated into Python’s standard library as the package xml.etree. Example 19-12
shows how to use it to parse our book catalog file one last time.
Example 19-12. PP4E\Lang\Xml\etreebook.py
"""
XML parsing: ElementTree (etree) provides a Python-based API for parsing/generating
"""
import pprint
from xml.etree.ElementTree import parse
mapping = {}
tree = parse('books.xml')
for B in tree.findall('book'):
isbn = B.attrib['isbn']
for T in B.findall('title'):
mapping[isbn] = T.text
pprint.pprint(mapping)
When run we get the exact same results as for SAX and DOM again, but the code
required to extract the file’s details seems noticeably simpler this time around:
C:\...\PP4E\Lang\Xml> python etreebook.py
{'0-596-00128-2': 'Python & XML',
'0-596-00797-3': 'Python Cookbook, 2nd Edition',
'0-596-10046-9': 'Python in a Nutshell, 2nd Edition',
'0-596-15806-8': 'Learning Python, 4th Edition',
'0-596-15808-4': 'Python Pocket Reference, 4th Edition',
'0-596-15810-6': 'Programming Python, 4th Edition'}
1434 | Chapter 19: Text and Language