Reversing : The Hacker's Guide to Reverse Engineering

(ff) #1

Compiler Architecture


The average compiler consists of three basic components. The front end is
responsible for deciphering the original program text and for ensuring that its
syntax is correct and in accordance with the language’s specifications. The
optimizer improves the program in one way or another, while preserving its
original meaning. Finally, the back end is responsible for generating the plat-
form-specific binary from the optimized code emitted by the optimizer. The
following sections discuss each of these components in depth.


Front End

The compilation process begins at the compiler’s front endand includes several
steps that analyze the high-level language source code. Compilation usually
starts with a process called lexical analysis or scanning, in which the compiler
goes over the source file and scans the text for individual tokens within it.
Tokens are the textual symbols that make up the code, so that in a line such as:


if (Remainder != 0)

The symbols if, (, Remainder, and !=are all tokens. While scanning for
tokens, the lexical analyzer confirms that the tokens produce legal “sentences”
in accordance with the rules of the language. For example, the lexical analyzer
might check that the token ifis followed by a (, which is a requirement in
some languages. Along with each word, the analyzer stores the word’s mean-
ing within the specific context. This can be thought of as a very simple version
of how humans break sentences down in natural languages. A sentence is
divided into several logical parts, and words can only take on actual meaning
when placed into context. Similarly, lexical analysis involves confirming the
legality of each token within the current context, and marking that context. If
a token is found that isn’t expected within the current context, the compiler
reports an error.
A compiler’s front end is probably the one component that is least relevant
to reversers, because it is primarily a conversion step that rarely modifies the
program’s meaning in any way—it merely verifies that it is valid and converts
it to the compiler’s intermediate representation.


Intermediate Representations

When you think about it, compilers are all about representations. A compiler’s
main role is to transform code from one representation to another. In the
process, a compiler must generate its own representation for the code. This
intermediate representation(or internal representation, as it’s sometimes called), is
useful for detecting any code errors, improving upon the code, and ultimately
for generating the resulting machine code.


Low-Level Software 55
Free download pdf