Reversing : The Hacker's Guide to Reverse Engineering

(ff) #1

Control Flow Analysis


Control flow analysis is the process of converting the unstructured control
flow graphs constructed by the front end into structured graphs that represent
high-level language constructs. This is where the decompiler converts abstract
blocks and conditional jumps to specific control flow constructs that represent
high-level concepts such as pretested and posttested loops, two-way condi-
tionals, and so on.
A thorough discussion of these control flow constructs and the way they are
implemented by most modern compilers is given in Appendix A. The actual
algorithms used to convert unstructured graphs into structured control flow
graphs are beyond the scope of this book. An extensive coverage of these algo-
rithms can be found in [Cifuentes2], [Cifuentes3].
Much of the control flow analysis is straightforward, but there are certain
compiler idioms that might warrant special attention at this stage in the
process. For example, many compilers tend to convert pretested loops to
posttested loops, while adding a special test before the beginning of the loop
to make sure that it is never entered if its condition is not satisfied. This is done
as an optimization, but it can somewhat reduce code readability from the
decompilation standpoint if it is not properly handled. The decompiler would
perform a literal translation of this layout and would present the initial test as
an additional ifstatement (that obviously never existed in the original pro-
gram source code), followed by a do...whileloop. It might make sense for
a decompiler writer to identify this case and correctly structure the control
flow graph to represent a regular pretested loop. Needless to say, there are
likely other cases like this where compiler optimizations alter the control flow
structure of the program in ways that would reduce the readability of decom-
piled output.


Finding Library Functions


Most executables contain significant amounts of library code that is linked into
the executable. During the decompilation process it makes a lot of sense to
identify these functions, mark them, and avoid decompiling them. There are
several reasons why this is helpful:


■■ Decompiling all of this library code is often unnecessary and adds
redundant code to the decompiler’s output. By identifying library calls
you can completely eliminate library code and increase the quality and
relevance of our decompiled output.
■■ Properly identifying library calls means additional “symbols” in the
program because you now have the names of every internal library call,
which greatly improves the readability of the decompiled output.

Decompilation 475
Free download pdf