Reversing : The Hacker's Guide to Reverse Engineering

(ff) #1
couldn’t directly update the EFLAGSregister (nowadays this is easy, because
the two units are implemented on a single chip). This meant that the result of
a floating-point comparison was written into a separate floating-point status
register, which then had to be loaded into one of the general-purpose registers,
and from there it was possible to test its value and perform a conditional
branch. Let’s look at an example.

00401000 FLD DWORD PTR [ESP+4]
00401004 FCOMP DWORD PTR [ESP+8]
00401008 FSTSW AX
0040100A TEST AH,41
0040100D JNZ SHORT 0040101D

This snippet loads one floating-point value into the floating-point stack
(essentially like a floating-point register), and compares another value against
the first value. Because the older FCOMPinstruction is used, the result is stored
in the floating-point status word. If the code were to use the newer FCOMIP
instruction, the outcome would be written directly into EFLAGS, but this is a
newer instruction that didn’t exist in older versions of the processor. Because
the result is stored in the floating-point status word, you need to somehow get
it out of there in order to test the result of the comparison and perform a con-
ditional branch. This is done using the FSTSWinstruction, which copies the
floating-point status word into the AXregister. Once that value is in AX, you
can test the specific flags and perform the conditional branch.
The bottom line of all of this is that to translate this sequence into the decom-
piler’s intermediate representation (which is not supposed to contain any
architecture-specific details), the front end must “understand” this sequence
for what it is, and eliminate the code that tests for specific flags (the constant
0x41) and so on. This is usually implemented by adding specific code in the
front end that knows how to decipher these types of sequences.

Generating Control Flow Graphs


The code generated by a decompiler’s front end is represented in a graph
structure, where each code block is called a basic block (BB). This graph struc-
ture simply represents the control flow instructions present in the low-level
machine code. Each BB ends with a control flow instruction such as a branch
instruction, a call, or a ret, or with a label that is referenced by some branch
instruction elsewhere in the code (because labels represent a control flow join).
Blocks are defined for each code segment that is referenced elsewhere in
the code, typically by a branch instruction. Additionally, a BB is created after
every conditional branch instruction, so that a conditional branch instruction

464 Chapter 13

Free download pdf