Reversing : The Hacker's Guide to Reverse Engineering

■■ Once you have properly identified library calls you can benefit from the fact that you have accurate type information for these calls. This information can be propagated across the program (see the section on data type propagation earlier in this chapter) and greatly improve readability. Techniques for accurately identifying library calls were described in [Emmerik1]. Without getting into too much detail, the basic idea is to create signatures for library files. These signatures are simply byte sequences that repre- sent the first few bytes of each function in the library. During decompilation the executable is scanned for these signatures (using a hash to make the process efficient), and the addresses of all library functions are recorded. The decompiler generally avoids decompilation of such functions and simply incorporates the details regarding their data types into the type-analysis process.

The Back End

A decompiler’s back end is responsible for producing actual high-level language code from the processed code that is produced during the code analysis stage. The back end is language-specific, and just as a compiler’s back end is interchangeable to allow the compiler to support more than one processor architecture, so is a decompiler’s back end. It can be fairly easily replaced to get the decompiler to produce different high-level language outputs. Let’s run a brief overview of how the back end produces code from the instructions in the intermediate representation. Instructions such as the assign- ment instruction typically referred to as asgnare fairly trivial to process because asgnalready contains expression trees that simply need to be ren- dered as text. The calland retinstructions are also fairly trivial. During data-flow analysis the decompiler prepares an argument list for callinstruc- tions and locates the return value for the retinstruction. These are stored along with the instructions and must simply be printed in the correct syntax (depending on the target language) during the code-generation phase. Probably the most complex step in this process is the creation of control flow statements from the structured control flow graph. Here, the decompiler must correctly choose the most suitable high-level language constructs for repre- senting the control flow graph. For instance, most high-level languages support a variety of loop constructs such as “do...while”, “while...”, and “for...”loops. Additionally, depending on the specific language, the code might have unconditional jumps inside the loop body. These must be trans- lated to keywords such as breakor continue, assuming that such keywords (or ones equivalent to them) are supported in the target language. Generating code for two-way or n-way conditionals is fairly straightfor- ward at this point, considering that the conditions have been analyzed during

476 Chapter 13

Reversing : The Hacker's Guide to Reverse Engineering

The Back End

Get our desktop app

Company

Features

Documentation

Resources