Reversing : The Hacker's Guide to Reverse Engineering

(ff) #1
Compiler-generated code can be difficult to read. Sometimes it is just so dif-
ferent from the original code structure of the program that it becomes difficult to
determine the software developer’s original intentions. A similar problem hap-
pens with arithmetic sequences: they are often rearranged to make them more
efficient, and one ends up with an odd looking sequence of arithmetic opera-
tions that might be very difficult to comprehend. The bottom line is that devel-
oping an understanding of the processes undertaken by compilers and the way
they “perceive” the code will help in eventually deciphering their output.
The following sections provide a bit of background information on compil-
ers and how they operate, and describe the different stages that take place
inside the average compiler. While it is true that the following sections could
be considered optional, I would still recommend that you go over them at
some point if you are not familiar with basic compilation concepts. I firmly
believe that reversers must truly know their systems, and no one can truly
claim to understand the system without understanding how software is cre-
ated and built.
It should be emphasized that compilers are extremely complex programs
that combine a variety of fields in computer science research and can have mil-
lions of lines of code. The following sections are by no means comprehen-
sive—they merely scratch the surface. If you’d like to deepen your knowledge
of compilers and compiler optimizations, you should check out [Cooper]
Keith D. Copper and Linda Torczon. Engineering a Compiler. Morgan Kauf-
mann Publishers, 2004, for a highly readable tutorial on compilation tech-
niques, or [Muchnick] Steven S. Muchnick. Advanced Compiler Design and
Implementation. Morgan Kaufmann Publishers, 1997, for a more detailed dis-
cussion of advanced compilation materials such as optimizations, and so on.

Defining a Compiler


At its most basic level, a compiler is a program that takes one representation of
a program as its input and produces a different representation of the same pro-
gram. In most cases, the input representation is a text file containing code that
complies with the specifications of a certain high-level programming lan-
guage. The output representation is usually a lower-level translation of the
same program. Such lower-level representation is usually read by hardware or
software, and rarely by people. The bottom line is usually that compilers trans-
form programs from their high-level, human-readable form into a lower-level,
machine-readable form.
During the translation process, compilers usually go through numerous
improvement or optimization steps that take advantage of the compiler’s
“understanding” of the program and employ various algorithms to improve
the code’s efficiency. As I have already mentioned, these optimizations tend to
have a strong “side effect”: they seriously degrade the emitted code’s read-
ability. Compiler-generated code is simply not meant for human consumption.

54 Chapter 2

Free download pdf