■■ Once you have properly identified library calls you can benefit from the
fact that you have accurate type information for these calls. This infor-
mation can be propagated across the program (see the section on data
type propagation earlier in this chapter) and greatly improve readability.
Techniques for accurately identifying library calls were described in
[Emmerik1]. Without getting into too much detail, the basic idea is to create sig-
natures for library files. These signatures are simply byte sequences that repre-
sent the first few bytes of each function in the library. During decompilation the
executable is scanned for these signatures (using a hash to make the process
efficient), and the addresses of all library functions are recorded. The decom-
piler generally avoids decompilation of such functions and simply incorporates
the details regarding their data types into the type-analysis process.
The Back End
A decompiler’s back end is responsible for producing actual high-level lan-
guage code from the processed code that is produced during the code analysis
stage. The back end is language-specific, and just as a compiler’s back end is
interchangeable to allow the compiler to support more than one processor
architecture, so is a decompiler’s back end. It can be fairly easily replaced to
get the decompiler to produce different high-level language outputs.
Let’s run a brief overview of how the back end produces code from the
instructions in the intermediate representation. Instructions such as the assign-
ment instruction typically referred to as asgnare fairly trivial to process
because asgnalready contains expression trees that simply need to be ren-
dered as text. The calland retinstructions are also fairly trivial. During
data-flow analysis the decompiler prepares an argument list for callinstruc-
tions and locates the return value for the retinstruction. These are stored
along with the instructions and must simply be printed in the correct syntax
(depending on the target language) during the code-generation phase.
Probably the most complex step in this process is the creation of control flow
statements from the structured control flow graph. Here, the decompiler must
correctly choose the most suitable high-level language constructs for repre-
senting the control flow graph. For instance, most high-level languages sup-
port a variety of loop constructs such as “do...while”, “while...”, and
“for...”loops. Additionally, depending on the specific language, the code
might have unconditional jumps inside the loop body. These must be trans-
lated to keywords such as breakor continue, assuming that such keywords
(or ones equivalent to them) are supported in the target language.
Generating code for two-way or n-way conditionals is fairly straightfor-
ward at this point, considering that the conditions have been analyzed during
476 Chapter 13