Reversing : The Hacker's Guide to Reverse Engineering

(ff) #1
Generally speaking, intermediate representations consist of tiny instruction
sets, as opposed to the huge instruction sets of some processor architecture
such as IA-32. Tiny instruction sets are possible because of complex expres-
sions used in almost every instruction.
The following is a generic description of the instruction set typically used by
decompilers. Notice that this example describes a generic instruction set that
can be used throughout the decompilation process, so that it can directly rep-
resent both a low-level representation that is very similar to the original
assembly language code and a high-level representation that can be translated
into a high-level language representation.
Assignment This is a very generic instruction that represents an assign-
ment operation into a register, variable, or other memory location (such
as a global variable). An assignment instruction can typically contain
complex expressions on either side.
Push Push a value into the stack. Again, the value being pushed can be
any kind of complex expression. These instructions are generally elimi-
nated during data-flow analysis since they have no direct equivalent in
high-level representations.
Pop Pop a value from the stack. These instructions are generally elimi-
nated during data-flow analysis since they have no direct equivalent in
high-level representations.
Call Call a subroutine and pass the listed parameters. Each parameter
can be represented using a complex expression. Keep in mind that to
obtain such a list of parameters, a decompiler would have to perform
significant analysis of the low-level code.
Ret Return from a subroutine. Typically supports a complex expression
to represent the procedure’s return value.
Branch A branch instruction evaluates two operands using a specified
conditional code and jumps to the specified address if the expression
evaluates to True. The comparison is performed on two expression trees,
where each tree can represent anything from a trivial expression (such as
a constant), to a complex expression. Notice how this is a higher-level
representation of what would require several instructions in native
assembly language; that’s a good example of how the intermediate rep-
resentation has the flexibility of showing both an assembly-language-
like low-level representation of the code and a higher-level
representation that’s closer to a high-level language.
Unconditional Jump An unconditional jump is a direct translation of the
unconditional jump instruction in the original program. It is used during
the construction of the control flow graph. The meanings of uncondi-
tional jumps are analyzed during the control flow analysis stage.

460 Chapter 13

Free download pdf