Reversing : The Hacker's Guide to Reverse Engineering

(ff) #1

Once information regarding primitive data types is gathered, it makes a lot
of sense to propagate it globally, as discussed earlier. This is generally true in
native code decompilation—you want to take every tiny piece of relevant
information you have and capitalize on it as much as possible.


Complex Data Types

How do decompilers deal with more complex data constructs such as structs
and arrays? The first step is usually to establish that a certain register holds a
memory address. This is trivial once an instruction that uses the register’s
value as a memory address is spotted somewhere throughout the code. At that
point decompilers rely on the type of pointer arithmetic performed on the
address to determine whether it is a struct or array and to create a definition
for that data type.
Code sequences that add hard-coded constants to pointers and then access
the resulting memory address can typically be assumed to be accessing structs.
The process of determining the specific primitive data type of each member
can be performed using the primitive data type identification techniques from
above.
Arrays are typically accessed in a slightly different way, without using hard-
coded offsets. Because array items are almost always accessed from inside a
loop, the most common access sequence for an array is to use an index and a
size multiplier. This makes arrays fairly easy to locate. Memory addresses that
are calculated by adding a value multiplied by a constant to the base memory
address are almost always arrays. Again the data type represented by the array
can hopefully be determined using our standard type-analysis toolkit.


Sometimes a struct or array can be accessed without loading a dedicated
register with the address to the data structure. This typically happens when a
specific array item or struct member is specified and when that data structure
resides on the stack. In such cases, the compiler can use hard-coded stack
offsets to access individual fields in the struct or items in the array. In such
cases, it becomes impossible to distinguish complex data types from simple
local variables that reside on the stack.

In some cases, it is just not possible to recover array versus data structure
information. This is most typical with arrays that are accessed using hard-
coded indexes. The problem is that in such cases compilers typically resort to
a hard-coded offset relative to the starting address of the array, which makes
the sequence look identical to a struct access sequence.


Decompilation 473
Free download pdf