Reversing : The Hacker's Guide to Reverse Engineering

(ff) #1

The Front End


Decompiler front ends perform the opposite function of compiler back ends.
Compiler back ends take a compiler’s intermediate representation and convert
it to the target machine’s native assembly language, whereas decompiler front
ends take the same native assembly language and convert it back into the
decompiler’s intermediate representation. The first step in this process is to go
over the source executable byte by byte and analyze each instruction, includ-
ing its operands. These instructions are then analyzed and converted into the
decompiler’s intermediate representation. This intermediate representation is
then slowly improved during the code analysis stage to prepare it for conver-
sion into a high-level language representation by the back end.


Some decompilers don’t actually go through the process of disassembling the
source executable but instead require the user to run it through a disassembler
(such as IDA Pro). The disassembler produces a textual representation of the
source program which can then be read and analyzed by the decompiler. This
does not directly affect the results of the decompilation process but merely
creates a minor inconvenince for the user.

The following sections discuss the individual stages that take place inside a
decompiler’s front end.


Semantic Analysis


A decompiler front end starts out by simply scanning the individual instruc-
tions and converting them into the decompiler’s intermediate representation,
but it doesn’t end there. Directly translating individual instructions often has
little value in itself, because some of these instructions only make sense
together, as a sequence. There are many architecture specific sequences that
are made to overcome certain limitations of the specific architecture. The
front end must properly resolve these types of sequences and correctly trans-
late them into the intermediate representation, while eliminating all of the
architecture-specific details.
Let’s take a look at an example of such a sequence. In the early days of the
IA-32 architecture, the floating-point unit was not an integral part of the
processor, and was actually implemented on a separate chip (typically referred
to as the math coprocessor) that had its own socket on the motherboard. This
meant that the two instruction sets were highly isolated from one another,
which imposed some limitations. For example, to compare two floating-point
values, one couldn’t just compare and conditionally branch using the standard
conditional branch instructions. The problem was that the math coprocessor


Decompilation 463
Free download pdf