3.9. DIVISION USING MULTIPLICATION
Listing 3.20: Non-optimizing GCC 4.4.1
public f
f proc near
arg_0 = dword ptr 8
push ebp
mov ebp, esp
mov ecx, [ebp+arg_0]
mov edx, 954437177 ; 38E38E39h
mov eax, ecx
imul edx
sar edx, 1
mov eax, ecx
sar eax, 1Fh
mov ecx, edx
sub ecx, eax
mov eax, ecx
pop ebp
retn
f endp
3.9.2 How it works
From school-level mathematics, we can remember that division by 9 can be replaced by multiplication by
1
9. In fact, sometimes compilers do so for floating-point arithmetics, for example,FDIVinstruction in x86
code can be replaced byFMUL. At least MSVC 6.0 will replace division by 9 by multiplication by 0 : 111111 :::
and sometimes it’s hard to be sure, what operation was in original source code.
But when we operate over integer values and integer CPU registers, we can’t use fractions. However, we
can rework fraction like that:
result=x 9 =x⋅^19 =x⋅^19 ⋅⋅MagicNumberMagicNumber
Given the fact that division by 2 nis very fast (using shifts), we now should find thatM agicN umber, for
which the following equation will be true: 2 n= 9⋅M agicN umber.
Division by 232 is somewhat hidden: lower 32-bit of product in EAX is not used (dropped), only higher
32-bit of product (in EDX) is used and then shifted by additional 1 bit.
In other words, the assembly code we have just seen multiplicates by
954437177
2 32+1 , or divides by
2 32+1
954437177.
To find divisor we just have to divide numerator by denominator. Using Wolfram Alpha, we can get
8.99999999.... as result (which is close to 9).
Read more about it in [Henry S. Warren,Hacker’s Delight, (2002)10-3].
Couple of words about better understanding. Many people miss “hidden” division by 232 or 264 , when lower
32-bit part (or 64-bit part) of product is not used. Also, there is misconception that modulo inverse is used
here. This is close, but not the same thing. Extended Euclidean algorithm is usually used to findmagic
coefficient, but in fact, this algorithm is rather used to solve the equation. You can solve it using any
other method. Anyway, Extended Euclidean algorithm is probably the most efficient way to solve it. Also,
needless to mention, the equation is unsolvable for some divisors, because this is diophantine equation
(i.e., equation allowing result to be only integer), since we work on integer CPU registers, after all.
3.9.3 ARM.
The ARM processor, just like in any other “pure” RISC processor lacks an instruction for division. It also
lacks a single instruction for multiplication by a 32-bit constant (recall that a 32-bit constant cannot fit
into a 32-bit opcode).
By taking advantage of this clever trick (orhack), it is possible to do division using only three instructions:
addition, subtraction and bit shifts (1.22 on page 304).
Here is an example that divides a 32-bit number by 10, from [Advanced RISC Machines Ltd,The ARM
Cookbook, (1994)3.3 Division by a Constant]. The output consists of the quotient and the remainder.