Reversing : The Hacker's Guide to Reverse Engineering

(ff) #1

Type Analysis


Depending on the specific platform for which the executable was created,
accurate type information is often not available in binary executables, certainly
not directly. Higher-level bytecodes such as the Java bytecode and MSIL do
contain accurate type information for function arguments, and class members
(MSIL also has local variable data types, which are not available in the Java
bytecode), which greatly simplifies the decompilation process. Native IA-32
executables (and this is true for most other processor architectures as well)
contain no explicit type information whatsoever, but type information can be
extracted using techniques such as the constraint-based techniques described
in [Mycroft]. The following sections describe techniques for gathering simple
and complex data type information from executables.

Primitive Data Types

When a register is defined (that is, when a value is first loaded into it) there is
often no data type information available whatsoever. How can the decompiler
determine whether a certain variable contains a signed or unsigned value, and
how long it is (char, short int, and so on)? Because many instructions com-
pletely ignore primitive data types and operate in the exact same way regard-
less of whether a register contains a signed or an unsigned value, the
decompiler must scan the code for instructions that are type sensitive. There
are several examples of such instructions.
For detecting signed versus unsigned values, the best method is to examine
conditional branches that are based on the value in question. That’s because
there are different groups of conditional branch instructions for signed and
unsigned operands (for more information on this topic please see Appendix
A). For example, the JGinstruction is used when comparing signed values,
while the JAinstruction is used when comparing unsigned values. By locating
one of these instructions and associating it with a specific register, the decom-
piler can propagate information on whether this register (and the origin of its
current value) contains a signed or an unsigned value.
The MOVZXand MOVSXinstructions make another source of information
regarding signed versus unsigned values. These instructions are used when
up-converting a value from 8 or 16 bits to 32 bits or from 8 bits to 16 bits. Here,
the compiler must select the right instruction to reflect the exact data type
being up-converted. Signed values must be sign extended using the MOVSX
instruction, while unsigned values must be zero extended, using the MOVZX
instruction. These instructions also reveal the exact length of a variable (before
the up-conversion and after it). In cases where a shorter value is used without
being up-converted first, the exact size of a specific value is usually easy to
determine by observing which part of the register is being used (the full 32
bits, the lower 16 bits, and so on).

472 Chapter 13

Free download pdf