expr-tail -> ^ [END, ) ]
expr-tail -> '+' <factor> <expr-tail> [+]
expr-tail -> '-' <factor> <expr-tail> [-]
factor -> <term> <factor-tail> [number, variable, ( ]
factor-tail -> ^ [+, -, END, ) ]
factor-tail -> '*' <term> <factor-tail> [*]
factor-tail -> '/' <term> <factor-tail> [/]
term -> <number> [number]
term -> <variable> [variable]
term -> '(' <expr> ')' [(]
tokens: (, ), num, var, -, +, /, *, set, end
This is a fairly typical grammar for a simple expression language, and it allows for
arbitrary expression nesting (some example expressions appear at the end of the test
parser module listing in Example 19-15). Strings to be parsed are either an expression
or an assignment to a variable name (set). Expressions involve numbers, variables, and
the operators +, −, , and /. Because factor is nested in expr in the grammar, and /
have higher precedence (i.e., they bind tighter) than + and −. Expressions can be en-
closed in parentheses to override precedence, and all operators are left associative—
that is, they group on the left (e.g., 1-2-3 is treated the same as (1-2)-3).
Tokens are just the most primitive components of the expression language. Each gram-
mar rule listed earlier is followed in square brackets by a list of tokens used to select it.
In recursive descent parsing, we determine the set of tokens that can possibly start a
rule’s substring, and we use that information to predict which rule will work ahead of
time. For rules that iterate (the -tail rules), we use the set of possibly following tokens
to know when to stop. Typically, tokens are recognized by a string processor (a scan-
ner), and a higher-level processor (a parser) uses the token stream to predict and step
through grammar rules and substrings.
The Parser’s Code
The system is structured as two modules, holding two classes:
- The scanner handles low-level character-by-character analysis.
- The parser embeds a scanner and handles higher-level grammar analysis.
The parser is also responsible for computing the expression’s value and testing the
system. In this version, the parser evaluates the expression while it is being parsed. To
use the system, we create a parser with an input string and call its parse method. We
can also call parse again later with a new expression string.
There’s a deliberate division of labor here. The scanner extracts tokens from the string,
but it knows nothing about the grammar. The parser handles the grammar, but it is
Custom Language Parsers| 1441