Reversing : The Hacker's Guide to Reverse Engineering

Compiler-generated code can be difficult to read. Sometimes it is just so different from the original code structure of the program that it becomes difficult to determine the software developer’s original intentions. A similar problem hap- pens with arithmetic sequences: they are often rearranged to make them more efficient, and one ends up with an odd looking sequence of arithmetic opera- tions that might be very difficult to comprehend. The bottom line is that devel- oping an understanding of the processes undertaken by compilers and the way they “perceive” the code will help in eventually deciphering their output. The following sections provide a bit of background information on compilers and how they operate, and describe the different stages that take place inside the average compiler. While it is true that the following sections could be considered optional, I would still recommend that you go over them at some point if you are not familiar with basic compilation concepts. I firmly believe that reversers must truly know their systems, and no one can truly claim to understand the system without understanding how software is cre- ated and built. It should be emphasized that compilers are extremely complex programs that combine a variety of fields in computer science research and can have mil- lions of lines of code. The following sections are by no means comprehen- sive—they merely scratch the surface. If you’d like to deepen your knowledge of compilers and compiler optimizations, you should check out [Cooper] Keith D. Copper and Linda Torczon. Engineering a Compiler. Morgan Kauf- mann Publishers, 2004, for a highly readable tutorial on compilation tech- niques, or [Muchnick] Steven S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers, 1997, for a more detailed dis- cussion of advanced compilation materials such as optimizations, and so on.

Defining a Compiler

At its most basic level, a compiler is a program that takes one representation of a program as its input and produces a different representation of the same program. In most cases, the input representation is a text file containing code that complies with the specifications of a certain high-level programming lan- guage. The output representation is usually a lower-level translation of the same program. Such lower-level representation is usually read by hardware or software, and rarely by people. The bottom line is usually that compilers trans- form programs from their high-level, human-readable form into a lower-level, machine-readable form. During the translation process, compilers usually go through numerous improvement or optimization steps that take advantage of the compiler’s “understanding” of the program and employ various algorithms to improve the code’s efficiency. As I have already mentioned, these optimizations tend to have a strong “side effect”: they seriously degrade the emitted code’s read- ability. Compiler-generated code is simply not meant for human consumption.

54 Chapter 2

Reversing : The Hacker's Guide to Reverse Engineering

Defining a Compiler

Get our desktop app

Company

Features

Documentation

Resources