[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

Other XML topics


Naturally, there is much more to Python’s XML support than these simple examples
imply. In deference to space, though, here are pointers to XML resources in lieu of
additional examples:


Standard library
First, be sure to consult the Python library manual for more on the standard li-
brary’s XML support tools. See the entries for re, xml.sax., xml.dom, and
xml.etree for more on this section’s examples.


PyXML SIG tools
You can also find Python XML tools and documentation at the XML Special In-
terest Group (SIG) web page at http://www.python.org. This SIG is dedicated to
wedding XML technologies with Python, and it publishes free XML tools inde-
pendent of Python itself. Much of the standard library’s XML support originated
with this group’s work.


Third-party tools
You can also find free, third-party Python support tools for XML on the Web by
following links at the XML SIGs web page. Of special interest, the 4Suite open
source package provides integrated tools for XML processing, including open
technologies such as DOM, SAX, RDF, XSLT, XInclude, XPointer, XLink, and
XPath.


Documentation
A variety of books have been published which specifically address XML and text
processing in Python. O’Reilly offers a book dedicated to the subject of XML pro-
cessing in Python, Python & XML, written by Christopher A. Jones and Fred L.
Drake, Jr.


As usual, be sure to also see your favorite web search engine for more recent develop-
ments on this front.


HTML Parsing in Action


Although more limited in scope, Python’s html.parser standard library module also
supports HTML-specific parsing, useful in “screen scraping” roles to extract informa-
tion from web pages. Among other things, this parser can be used to process Web
replies fetched with the urllib.request module we met in the Internet part of this book,
to extract plain text from HTML email messages, and more.


The html.parser module has an API reminiscent of the XML SAX model of the prior
section: it provides a parser which we subclass to intercept tags and their data during
a parse. Unlike SAX, we don’t provide a handler class, but extend the parser class
directly. Here’s a quick interactive example to demonstrate the basics (I copied all of
this section’s code into file htmlparser.py in the examples package if you wish to ex-
periment with it yourself):


XML and HTML Parsing | 1435
Free download pdf