Reversing : The Hacker's Guide to Reverse Engineering

(ff) #1
The control flow graph in Figure 13.3 is quite primitive. It is essentially a
graphical representation of the low-level control flow statement in the pro-
gram. It is important to perform this simple analysis at this early stage in
decompilation to correctly break the program into basic blocks. The process of
actually structuring these graphs into a representation closer to the one used
by high-level languages is performed later, during the control flow analysis
stage.

Code Analysis


Strictly speaking, a decompiler doesn’t have an optimizing stage. After all,
you’re looking to produce a high-level language representation from a binary
executable, and not to “improve” the program in any way. On the contrary,
you want the output to match the original program as closely as possible. In
reality, this optimizing, or code-improving, phase in a decompiler is where the
program is transformed from a low-level intermediate representation to a
higher-level intermediate representation that is ready to be transformed into a
high-level language code. This process could actually be described as the
opposite of the compiler’s optimization process—you’re trying to undo many
of the compiler’s optimizations.
The code analysis stage is where much of the interesting stuff happens.
Decompilation literature is quite scarce, and there doesn’t seem to be an offi-
cial term for this stage, so I’ll just name it the code analysis stage, even though
some decompiler researchers simply call it the middle-end.
The code analysis stage starts with an intermediate representation of the
program that is fairly close to the original assembly language code. The pro-
gram is represented using an instruction set similar to the one discussed in the
previous section, but it still lacks any real expressions. The code analysis
process includes data-flow analysis, which is where these expressions are
formed, type analysis which is where complex and primitive data types are
detected, and control flow analysis, which is where high-level control flow
constructs are recovered from the unstructured control flow graph created by
the front end. These stages are discussed in detail in the following sections.

Data-Flow Analysis


Data-flow analysis is a critical stage in the decompilation process. This is
where the decompiler analyzes the individual, seemingly unrelated machine
instructions and makes the necessary connections between them. The connec-
tions are created by tracking the flow of data within those instructions and
analyzing the impact each individual instruction has on registers and memory

466 Chapter 13

Free download pdf