Reversing : The Hacker's Guide to Reverse Engineering

(ff) #1
and generates a corresponding machine code file. Depending on the high-level
language, this machine code can either be a standard platform-specific object
code that is decoded directly by the CPU or it can be encoded in a special plat-
form-independent format called bytecode(see the following section on byte-
codes).
Compilers of traditional (non-bytecode-based) programming languages
such as C and C++ directly generate machine-readable object code from the
textual source code. What this means is that the resulting object code, when
translated to assembly language by a disassembler, is essentially a machine-
generated assembly language program. Of course, it is not entirely machine-
generated, because the software developer described to the compiler what
needed to be done in the high-level language. But the details of how things are
carried out are taken care of by the compiler, in the resulting object code. This
is an important point because this code is not always easily understandable,
even when compared to a man-made assembly language program—machines
think differently than human beings.
The biggest hurdle in deciphering compiler-generated code is the optimiza-
tions applied by most modern compilers. Compilers employ a variety of tech-
niques that minimize code size and improve execution performance. The
problem is that the resulting optimized code is often counterintuitive and dif-
ficult to read. For instance, optimizing compilers often replace straightforward
instructions with mathematically equivalent operations whose purpose can be
far from obvious at first glance.
Significant portions of this book are dedicated to the art of deciphering
machine-generated assembly language. We will be studying some compiler
basics in Chapter 2 and proceed to specific techniques that can be used to
extract meaningful information from compiler-generated code.

Virtual Machines and Bytecodes


Compilers for high-level languages such as Java generate a bytecode instead of
an object code. Bytecodes are similar to object codes, except that they are usu-
ally decoded by a program, instead of a CPU. The idea is to have a compiler
generate the bytecode, and to then use a program called a virtual machineto
decode the bytecode and perform the operations described in it. Of course, the
virtual machine itself must at some point convert the bytecode into standard
object code that is compatible with the underlying CPU.
There are several major benefits to using bytecode-based languages. One
significant advantage is platform independence. The virtual machine can be
ported to different platforms, which enables running the same binary program
on any CPU as long as it has a compatible virtual machine. Of course, regard-
less of which platform the virtual machine is currently running on, the byte-
code format stays the same. This means that theoretically software developers

12 Chapter 1

Free download pdf