Reversing : The Hacker's Guide to Reverse Engineering

Type Analysis

Depending on the specific platform for which the executable was created, accurate type information is often not available in binary executables, certainly not directly. Higher-level bytecodes such as the Java bytecode and MSIL do contain accurate type information for function arguments, and class members (MSIL also has local variable data types, which are not available in the Java bytecode), which greatly simplifies the decompilation process. Native IA-32 executables (and this is true for most other processor architectures as well) contain no explicit type information whatsoever, but type information can be extracted using techniques such as the constraint-based techniques described in [Mycroft]. The following sections describe techniques for gathering simple and complex data type information from executables.

Primitive Data Types

When a register is defined (that is, when a value is first loaded into it) there is often no data type information available whatsoever. How can the decompiler determine whether a certain variable contains a signed or unsigned value, and how long it is (char, short int, and so on)? Because many instructions com- pletely ignore primitive data types and operate in the exact same way regard- less of whether a register contains a signed or an unsigned value, the decompiler must scan the code for instructions that are type sensitive. There are several examples of such instructions. For detecting signed versus unsigned values, the best method is to examine conditional branches that are based on the value in question. That’s because there are different groups of conditional branch instructions for signed and unsigned operands (for more information on this topic please see Appendix A). For example, the JGinstruction is used when comparing signed values, while the JAinstruction is used when comparing unsigned values. By locating one of these instructions and associating it with a specific register, the decompiler can propagate information on whether this register (and the origin of its current value) contains a signed or an unsigned value. The MOVZXand MOVSXinstructions make another source of information regarding signed versus unsigned values. These instructions are used when up-converting a value from 8 or 16 bits to 32 bits or from 8 bits to 16 bits. Here, the compiler must select the right instruction to reflect the exact data type being up-converted. Signed values must be sign extended using the MOVSX instruction, while unsigned values must be zero extended, using the MOVZX instruction. These instructions also reveal the exact length of a variable (before the up-conversion and after it). In cases where a shorter value is used without being up-converted first, the exact size of a specific value is usually easy to determine by observing which part of the register is being used (the full 32 bits, the lower 16 bits, and so on).

472 Chapter 13

Reversing : The Hacker's Guide to Reverse Engineering

Type Analysis

Primitive Data Types

Get our desktop app

Company

Features

Documentation

Resources