Reversing : The Hacker's Guide to Reverse Engineering

(ff) #1
prevent parallelism. The general strategy employed by modern IA-32 proces-
sors for achieving parallelism is to simply execute two or more instructions at
the same time. The problems start when one instruction depends on informa-
tion produced by the other. In such cases the instructions must be executed in
their original order, in order to preserve the code’s functionality.
Because of these restrictions, modern compilers employ a multitude of tech-
niques for generating code that could be made to run as efficiently as possible
on modern processors. This naturally has a strong impact on the readability of
disassembled code while reversing. Understanding the rationale behind such
optimization techniques might help you decipher such optimized code.
The following sections discuss the general architecture of modern IA-32
processors and how they achieve parallelism and high instruction throughput.

This subject is optional and is discussed here because it is always best to know
why things are as they are. In this case, having a general understanding of why
optimized IA-32 code is arranged the way it is can be helpful when trying to
decipher its meaning.

64 Chapter 2


IA-32 COMPATIBLE PROCESSORS
Over the years, many companies have attempted to penetrate the lucrative
IA-32 processor market (which has been completely dominated by Intel
Corporation) by creating IA-32 compatible processors. The strategy has usually
been to offer better-priced processors that are 100 percent compatible with
Intel’s IA-32 processors and offer equivalent or improved performance. AMD
(Advanced Micro Devices) has been the most successful company in this
market, with an average market share of over 15 percent in the IA-32 processor
market.
While getting to know IA-32 assembly language there isn’t usually a need to
worry about other brands because of their excellent compatibility with the Intel
implementations. Even code that’s specifically optimized for Intel’s NetBurst
architecture usually runs extremely well on other implementations such as the
AMD processors, so that compilers rarely have to worry about specific
optimizations for non-Intel processors.
One substantial AMD-specific feature is the 3DNow! instruction set. 3DNow!
defines a set of SIMD (single instruction multiple data) instructions that can
perform multiple floating-point operations per clock cycle. 3DNow! stands in
direct competition to Intel’s SSE, SSE2, and SSE3 (Streaming SIMD Extensions).
In addition to supporting their own 3DNow! instruction set, AMD processors
also support Intel’s SSE extensions in order to maintain compatibility. Needless
to say, Intel processors don’t support 3DNow!.
Free download pdf