[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1
C:\...\PP4E\Lang\Xml> python saxbook.py
{'0-596-00128-2': 'Python & XML',
'0-596-00797-3': 'Python Cookbook, 2nd Edition',
'0-596-10046-9': 'Python in a Nutshell, 2nd Edition',
'0-596-15806-8': 'Learning Python, 4th Edition',
'0-596-15808-4': 'Python Pocket Reference, 4th Edition',
'0-596-15810-6': 'Programming Python, 4th Edition'}

DOM parsing


The DOM parsing model for XML is perhaps simpler to understand—we simply tra-
verse a tree of objects after the parse—but it might be less efficient for large documents,
if the document is parsed all at once ahead of time and stored in memory. DOM also
supports random access to document parts via tree fetches, nested loops for known
structures, and recursive traversals for arbitrary nesting; in SAX, we are limited to a
single linear parse. Example 19-11 is a DOM-based equivalent to the SAX parser of the
preceding section.


Example 19-11. PP4E\Lang\Xml\dombook.py


"""
XML parsing: DOM gives whole document to the application as a traversable object
"""


import pprint
import xml.dom.minidom
from xml.dom.minidom import Node


doc = xml.dom.minidom.parse("books.xml") # load doc into object


usually parsed up front


mapping = {}
for node in doc.getElementsByTagName("book"): # traverse DOM object
isbn = node.getAttribute("isbn") # via DOM object API
L = node.getElementsByTagName("title")
for node2 in L:
title = ""
for node3 in node2.childNodes:
if node3.nodeType == Node.TEXT_NODE:
title += node3.data
mapping[isbn] = title


mapping now has the same value as in the SAX example


pprint.pprint(mapping)


The output of this script is the same as what we generated interactively for the SAX
parser; here, though, it is built up by walking the document object tree after the parse
has finished using method calls and attributes defined by the cross-language DOM
standard specification. This is both a strength and potential weakness of DOM—its
API is language neutral, but it may seem a bit nonintuitive and verbose to some Python
programmers accustomed to simpler models:


C:\...\PP4E\Lang\Xml> python dombook.py
{'0-596-00128-2': 'Python & XML',

XML and HTML Parsing | 1433
Free download pdf