[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1
grammar definitions. Better yet, we could use an integration that already exists—
interfaces to such common parser generators are freely available in the open source
domain (run a web search for up-to-date details and links).
In addition, a number of Python-specific parsing systems are available on the Web.
Among them: PLY is an implementation of lex and yacc parsing tools in and for
Python; the kwParsing system is a parser generator written in Python; PyParsing
is a pure-Python class library that makes it easy to build recursive-descent parsers
quickly; and the SPARK toolkit is a lightweight system that employs the Earley
algorithm to work around technical problems with LALR parser generation (if you
don’t know what that means, you probably don’t need to care).
Of special interest to this chapter, YAPPS (Yet Another Python Parser System) is a
parser generator written in Python. It uses supplied grammar rules to generate
human-readable Python code that implements a recursive descent parser; that is,
it’s Python code that generates Python code. The parsers generated by YAPPS look
much like (and were inspired by) the handcoded custom expression parsers shown
in the next section. YAPPS creates LL(1) parsers, which are not as powerful as
LALR parsers but are sufficient for many language tasks. For more on YAPPS, see
http://theory.stanford.edu/~amitp/Yapps or search the Web at large.

Natural language processing
Even more demanding language analysis tasks require techniques developed in
artificial intelligence research, such as semantic analysis and machine learning. For
instance, the Natural Language Toolkit, or NLTK, is an open source suite of Python
libraries and programs for symbolic and statistical natural language processing. It
applies linguistic techniques to textual data, and it can be used in the development
of natural language recognition software and systems. For much more on this sub-
ject, be sure to also see the O’Reilly book Natural Language Processing with Py-
thon, which explores, among other things, ways to use NLTK in Python. Not every
system’s users will pose questions in a natural language, of course, but there are
many applications which can make good use of such utility.


Though widely useful, parser generator systems and natural language analysis toolkits
are too complex for us to cover in any sort of useful detail in this text. Consult http://
python.org/ or search the Web for more information on language analysis tools available
for use in Python programs. For the purposes of this chapter, let’s move on to explore
a more basic and manual approach that illustrates concepts underlying the domain—
recursive descent parsing.


Lesson 2: Don’t Reinvent the Wheel (Usually)
Speaking of parser generators, to use some of these tools in Python programs, you’ll
need an extension module that integrates them. The first step in such scenarios should
always be to see whether the extension already exists in the public domain. Especially
for common tools like these, chances are that someone else has already implemented
an integration that you can use off-the-shelf instead of writing one from scratch.

Advanced Language Tools| 1439
Free download pdf