Reverse Engineering for Beginners

(avery) #1

CHAPTER 3. HELLO, WORLD! CHAPTER 3. HELLO, WORLD!


#include <stdio.h>


const char $SG3830[]="hello, world\n";


int main()
{
printf($SG3830);
return 0;
}


Let’s go back to the assembly listing. As we can see, the string is terminated by a zero byte, which is standard for C/C++
strings.More about C/C++ strings:57.1.1 on page 630.


In the code segment,_TEXT, there is only one function so far:main().The functionmain()starts with prologue code and
ends with epilogue code (like almost any function)^1.


After the function prologue we see the call to theprintf()function:CALL _printf. Before the call the string address
(or a pointer to it) containing our greeting is placed on the stack with the help of thePUSHinstruction.


When theprintf()function returns the control to themain()function, the string address (or a pointer to it) is still on
the stack. Since we do not need it anymore, thestack pointer(theESPregister) needs to be corrected.


ADD ESP, 4means add 4 to theESPregister value.


Why 4? Since this is a 32-bit program, we need exactly 4 bytes for address passing through the stack. If it was x64 code we
would need 8 bytes.ADD ESP, 4is effectively equivalent toPOP registerbut without using any register^2.


For the same purpose, some compilers (like the Intel C++ Compiler) may emitPOP ECXinstead ofADD(e.g., such a pattern
can be observed in the Oracle RDBMS code as it is compiled with the Intel C++ compiler). This instruction has almost the
same effect but theECXregister contents will be overwritten. The Intel C++ compiler probably usesPOP ECXsince this
instruction’s opcode is shorter thanADD ESP, x(1 byte forPOPagainst 3 forADD).


Here is an example of usingPOPinstead ofADDfrom Oracle RDBMS:


Listing 3.2: Oracle RDBMS 10.2 Linux (app.o file)

.text:0800029A push ebx
.text:0800029B call qksfroChild
.text:080002A0 pop ecx


After callingprintf(), the original C/C++ code contains the statementreturn 0—return 0 as the result of themain()
function.


In the generated code this is implemented by the instructionXOR EAX, EAX.


XORis in fact just “eXclusive OR”^3 but the compilers often use it instead ofMOV EAX, 0—again because it is a slightly
shorter opcode (2 bytes forXORagainst 5 forMOV).


Some compilers emitSUB EAX, EAX, which meansSUBtract the value in theEAXfrom the value inEAX, which, in any case,
results in zero.


The last instructionRETreturns the control to thecaller. Usually, this is C/C++CRT^4 code, which, in turn, returns control to
theOS.


3.1.2 GCC


Now let’s try to compile the same C/C++ code in the GCC 4.4.1 compiler in Linux:gcc 1.c -o 1.Next, with the assistance
of theIDAdisassembler, let’s see how themain()function was created.IDA, like MSVC, uses Intel-syntax^5.


Listing 3.3: code inIDA

main proc near


var_10 = dword ptr -10h


push ebp
mov ebp, esp

(^1) You can read more about it in the section about function prologues and epilogues (4 on page 22).
(^2) CPU flags, however, are modified
(^3) wikipedia
(^4) C runtime library :68.1 on page 668
(^5) We could also have GCC produce assembly listings in Intel-syntax by applying the options-S -masm=intel.

Free download pdf