Linux Kernel Architecture

(Jacob Rumans) #1
Mauerer app03.tex V1 - 09/04/2008 6:11pm Page 1176

Appendix C: Notes on C.....................................................


C.1.1 From Source Code to Machine Program


The work of compilers can be divided into several phases, as the following overview demonstrates:

❑ Preprocessing— All pre-processor actions are performed in this phase. Depending on the
compiler version, this phase is supported by an external utility (cpp) or by special library
functions — both are initiated automatically by the compiler. On completion of preprocessing,
there is only one (large) input file generated from the source file and all header files are included
using the#includedirective. The compiler itself is then no longer required to take account of
the distributed structure of C programs over several files.
❑ Scanning and Parsing— The syntax of a programming language can be described by means
of grammatical rules that are similar to those of a natural language such as English but that
must understandably be much more restrictive. (Although the existence of several alternatives
to represent one and the same fact contribute greatly to the appeal and subtlety of a language,
ambiguity must be avoided at all costs in programming languages.) This phase usually com-
prises two closely linked tasks. Thescanneranalyzes the source text character-by-character and
looks for keywords of the programming language. Theparsertakes the input stream supplied by
the scanner and already abstracted from source text representation and checks that the structures
it detects are correct in terms of the grammar rulesof the language. It also creates data structures
in computer memory that are a further abstraction of the source code and are designed for pro-
cessing by a computer (in contrast to the actual source code of the program that should be as
easy as possible to read and manipulate by human beings).
❑ Intermediate Code Generation— A further step along the path toward the final machine
code converts theparse tree(i.e., the data structure created in memory) set up by the
scanner and parser into another language known as theregister transfer language(RTL).
This is a kind of assembly language for a hypothetical machine. This language can be
optimized — independently of the target processor for the most part. However, this does
not mean that the RTL code generated in this phase of the compilation process is the same
for all target processors. Depending on the architecture, a range of assembler statements are
available — and this fact must be taken into account during RTL generation.
The individual statements of the RTL are already on a very low level and are a step away from
the high-level C language on the path to the assembly language. Their main task is to manipulate
register values to support execution of the compiled program. There are, of course, also condi-
tional statements and other mechanisms to control program flow. However, this intermediate
code still includes various elements and structures common to higher-level programming lan-
guages (these arenotspecific to a particular language such as C, Pascal, etc.) that do not appear
in a pure assembly language.
❑ Optimization— The most compute-intensive phase of program compilation is optimization
of the intermediate code in the RTL language. The reasons why programs are optimized are
clear. But how is this done by the compiler? Because the mechanisms used are generally not
only complex but also sophisticated and even devious (subtle details must always be taken into
account), it would not be difficult to write a long tome on optimization techniques alone, and a
further one on their usage in the GCC. Nevertheless, this appendix illustrates at least some of
the techniques employed. All optimization options are based on ideas that initially appear to
be relatively simple. However, in practice (and in theory) they are difficult to implement. Such
options include, above all, the simplification of arithmetic expressions (algebraic rewriting of
terms into expressions that can be computed more efficiently and/or with a less-intensive use of
memory), elimination of dead code (parts of code that cannot be reached by the program flow),
Free download pdf