Properly choosing the intermediate representation of code in a compiler is
one of the compiler designer’s most important design decisions. The layout
heavily depends on what kind of source (high-level language) the compiler
takes as input, and what kind of object code the compiler spews out. Some
intermediate representations can be very close to a high-level language and
retain much of the program’s original structure. Such information can be use-
ful if advanced improvements and optimizations are to be performed on the
code. Other compilers use intermediate representations that are closer to a
low-level assembly language code. Such representations frequently strip
much of the high-level structures embedded in the original code, and are suit-
able for compiler designs that are more focused on the low-level details of the
code. Finally, it is not uncommon for compilers to have two or more interme-
diate representations, one for each stage in the compilation process.
Optimizer
Being able to perform optimizations is one of the primary reasons that
reversers should understand compilers (the other reason being to understand
code-level optimizations performed in the back end). Compiler optimizers
employ a wide variety of techniques for improving the efficiency of the code.
The two primary goals for optimizers are usually either generating the most
high-performance code possible or generating the smallest possible program
binaries. Most compilers can attempt to combine the two goals as much as pos-
sible.
Optimizations that take place in the optimizer are not processor-specific and
are generic improvements made to the original program’s code without any
relation to the specific platform to which the program is targeted. Regardless of
the specific optimizations that take place, optimizers must always preserve the
exact meaning of the original program and not change its behavior in any way.
The following sections briefly discuss different areas where optimizers can
improve a program. It is important to keep in mind that some of the opti-
mizations that strongly affect a program’s readability might come from the
processor-specific work that takes place in the back end, and not only from the
optimizer.
Code Structure
Optimizers frequently modify the structure of the code in order to make it
more efficient while preserving its meaning. For example, loops can often be
partially or fully unrolled. Unrolling a loop means that instead of repeating the
same chunk of code using a jump instruction, the code is simply duplicated so
that the processor executes it more than once. This makes the resulting binary
larger, but has the advantage of completely avoiding having to manage a
counter and invoke conditional branches (which are fairly inefficient—see the
56 Chapter 2