4.5 Bottom-Up Parsing 193
bottom-up parser is the stream of tokens of a program and the output is a
sequence of grammar rules. The shift action moves the next input token onto
the parser’s stack. A reduce action replaces an RHS (the handle) on top of the
parser’s stack by its corresponding LHS. Every parser for a programming lan-
guage is a pushdown automaton (PDA), because a PDA is a recognizer for
a context-free language. You need not be intimate with PDAs to understand
how a bottom-up parser works, although it helps. A PDA is a very simple
mathematical machine that scans strings of symbols from left to right. A PDA
is so named because it uses a pushdown stack as its memory. PDAs can be used
as recognizers for context-free languages. Given a string of symbols over the
alphabet of a context-free language, a PDA that is designed for the purpose
can determine whether the string is or is not a sentence in the language. In
the process, the PDA can produce the information needed to construct a parse
tree for the sentence.
With a PDA, the input string is examined, one symbol at a time, left to
right. The input is treated very much as if it were stored in another stack,
because the PDA never sees more than the leftmost symbol of the input.
Note that a recursive-descent parser is also a PDA. In that case, the stack
is that of the run-time system, which records subprogram calls (among other
things), which correspond to the nonterminals of the grammar.
4.5.3 LR Parsers
Many different bottom-up parsing algorithms have been devised. Most of
them are variations of a process called LR. LR parsers use a relatively small
program and a parsing table that is built for a specific programming lan-
guage. The original LR algorithm was designed by Donald Knuth (Knuth,
1965). This algorithm, which is sometimes called canonical LR, was not
used in the years immediately following its publication because producing
the required parsing table required large amounts of computer time and
memory. Subsequently, several variations on the canonical LR table con-
struction process were developed (DeRemer, 1971; DeRemer and Pennello,
1982). These are characterized by two properties: (1) They require far less
computer resources to produce the required parsing table than the canoni-
cal LR algorithm, and (2) they work on smaller classes of grammars than the
canonical LR algorithm.
There are three advantages to LR parsers:
- They can be built for all programming languages.
- They can detect syntax errors as soon as it is possible in a left-to-right
scan. - The LR class of grammars is a proper superset of the class parsable by
LL parsers (for example, many left recursive grammars are LR, but
none are LL).
The only disadvantage of LR parsing is that it is difficult to produce by hand
the parsing table for a given grammar for a complete programming language.