Hacking - The Art of Exploitation, 2nd Edition

(Romina) #1
Programming 21

As long as the compiled program works, the average programmer is


only concerned with source code. But a hacker realizes that the compiled


program is what actually gets executed out in the real world. With a better


understanding of how the CPU operates, a hacker can manipulate the pro-


grams that run on it. We have seen the source code for our first program and


compiled it into an executable binary for the x86 architecture. But what does


this executable binary look like? The GNU development tools include a pro-


gram called objdump, which can be used to examine compiled binaries. Let’s


start by looking at the machine code the main() function was translated into.


reader@hacking:~/booksrc $ objdump -D a.out | grep -A20 main.:
08048374

:
8048374: 55 push %ebp
8048375: 89 e5 mov %esp,%ebp
8048377: 83 ec 08 sub $0x8,%esp
804837a: 83 e4 f0 and $0xfffffff0,%esp
804837d: b8 00 00 00 00 mov $0x0,%eax
8048382: 29 c4 sub %eax,%esp
8048384: c7 45 fc 00 00 00 00 movl $0x0,0xfffffffc(%ebp)
804838b: 83 7d fc 09 cmpl $0x9,0xfffffffc(%ebp)
804838f: 7e 02 jle 8048393 <main+0x1f>
8048391: eb 13 jmp 80483a6 <main+0x32>
8048393: c7 04 24 84 84 04 08 movl $0x8048484,(%esp)
804839a: e8 01 ff ff ff call 80482a0 printf@plt
804839f: 8d 45 fc lea 0xfffffffc(%ebp),%eax
80483a2: ff 00 incl (%eax)
80483a4: eb e5 jmp 804838b <main+0x17>
80483a6: c9 leave
80483a7: c3 ret
80483a8: 90 nop
80483a9: 90 nop
80483aa: 90 nop
reader@hacking:~/booksrc $


The objdump program will spit out far too many lines of output to


sensibly examine, so the output is piped into grep with the command-line


option to only display 20 lines after the regular expression main.:. Each byte


is represented in hexadecimal notation, which is a base-16 numbering system. The


numbering system you are most familiar with uses a base-10 system, since at


10 you need to add an extra symbol. Hexadecimal uses 0 through 9 to


represent 0 through 9, but it also uses A through F to represent the values


10 through 15. This is a convenient notation since a byte contains 8 bits, each


of which can be either true or false. This means a byte has 256 (2^8 ) possible


values, so each byte can be described with 2 hexadecimal digits.


The hexadecimal numbers—starting with 0x8048374 on the far left—are


memory addresses. The bits of the machine language instructions must be


put somewhere, and this somewhere is called memory. Memory is just a


collection of bytes of temporary storage space that are numbered with


addresses.

Free download pdf