Basically, what you have here is y=x*2*2*2*2*2, which is equivalent to
y=x*32. This code, generated by Intel’s compiler, is actually quite surprising
when you think about it. First of all, in terms of code size it is big—one LEAand
four ADDs are quite a bit longer than a single SHL. Second, it is surprising that
this sequence is actually quicker than a simple SHLby 5, especially consider-
ing that SHLis considered to be a fairly high-performance instruction. The
explanation is that LEAand ADDare both very low-latency, high-throughput
instructions. In fact, this entire sequence could probably execute in less than
three clock cycles (though this depends on the specific processor and on other
environmental aspects). In contrast, SHLhas a latency of four clocks cycles,
which is why using it is just not as efficient.
Let’s examine another multiplication sequence:
lea eax, DWORD PTR [esi + esi * 2]
sal eax, 2
sub eax, esi
This sequence, which was generated by GCC, uses LEAto multiply ESIby
3, and then uses SAL(SALis the same instruction as SHL—they share the same
opcode) to further multiply by 4. These two operations multiply the operand
by 12. The code then subtracts the operand from the result. This sequence
essentially multiplies the operand by 11. Mathematically this can be viewed as:
y=(x+x*2)*4-x.
Division
For computers, division is the most complex operation in integer arithmetic.
The built-in instructions for division, DIVand IDIVare (relatively speaking)
veryslow and have a latency of over 50 clock cycles (on latest crops of NetBurst
processors). This compares with a latency of less than one cycle for additions
and subtractions (which can be executed in parallel). For unknown divisors,
the compiler has no choice but to use DIV. This is usually bad for performance
but is good for reversers because it makes for readable and straightforward
code.
With constant divisors, the situation becomes far more complicated. The
compiler can employ some highly creative techniques for efficiently imple-
menting division, depending on the divisor. The problem is that the resulting
code is often highly unreadable. The following sections discuss reciprocal mul-
tiplication, which is an optimized division technique.
Understanding Reciprocal-Multiplications
The idea with reciprocal multiplication is to use multiplication instead of divi-
sion in order to implement a division operation. Multiplication is 4 to 6 times
524 Appendix B
22_574817 appb.qxd 3/16/05 8:45 PM Page 524