ugh.book

(singke) #1
Programming in Plato’s Cave 177

Unix programmers are like mathematicians. It’s a curious phenomenon we
call “Programming by Implication.” Once we were talking to a Unix pro-
grammer about how nice it would be to have a utility that could examine a
program and then answer questions such as: “What functions call function
foo?” or “Which functions modify the global variable bar?” He agreed that
it would be useful and then observed that, “You could write a program like
that.”


To be fair, the reason he said “You could write a program like that” instead
of actually writing the program is that some properties of the C language
and the Unix “Programming Environment” combine synergistically to
make writing such a utility a pain of epic proportion.


You may think we exaggerate, and that this utility could be easily imple-
mented by writing a number of small utility programs and then piping them
together, but we’re not, and it can’t.


Parsing with yacc


“Yacc” was what I felt like doing after I learned how to use yacc(1).

—Anonymous

“YACC” stands for Yet Another Compiler Compiler. It takes a context-
free grammar describing a language to be parsed and computes a state
machine for a universal pushdown automaton. When the state machine is
run, one gets a parser for the language. The theory is well understood since
one of the big research problems in the olden days of computer science was
reducing the time it took to write compilers.


This scheme has one small problem: most programming languages are not
context-free. Thus, yacc users must specify code fragments to be run at
certain state transitions to handle the cases where context-free grammars
blow up. (Type checking is usually done this way.) Most C compilers
today have a yacc-generated parser; the yacc grammar for GCC 2.1 (an
otherwise fine compiler written by the Free Software Foundation) is about
1650 lines long. The actual code output by yacc and the code for the uni-
versal pushdown automaton that runs the yacc output are much larger.


Some programming languages are easier to parse. Lisp, for example, can
be parsed by a recursive-descent parser. “Recursive-descent” is computer
jargon for “simple enough to write on a liter of Coke.” As an experiment,

Free download pdf