1.19. FLOATING-POINT UNIT
GCC 4.4.1 (with-O3option) emits the same code, just slightly different:
Listing 1.202: Optimizing GCC 4.4.1
public f
f proc near
arg_0 = qword ptr 8
arg_8 = qword ptr 10h
push ebp
fld ds:dbl_8048608 ; 3.14
; stack state now: ST(0) = 3.14
mov ebp, esp
fdivr [ebp+arg_0]
; stack state now: ST(0) = result of division
fld ds:dbl_8048610 ; 4.1
; stack state now: ST(0) = 4.1, ST(1) = result of division
fmul [ebp+arg_8]
; stack state now: ST(0) = result of multiplication, ST(1) = result of division
pop ebp
faddp st(1), st
; stack state now: ST(0) = result of addition
retn
f endp
The difference is that, first of all, 3.14 is pushed to the stack (intoST(0)), and then the value inarg_0is
divided by the value inST(0).
FDIVRstands forReverse Divide—to divide with divisor and dividend swapped with each other. There is
no likewise instruction for multiplication since it is a commutative operation, so we just haveFMULwithout
its-Rcounterpart.
FADDPadds the two values but also pops one value from the stack. After that operation,ST(0)holds the
sum.
ARM: Optimizing Xcode 4.6.3 (LLVM) (ARM mode)
Until ARM got standardized floating point support, several processor manufacturers added their own in-
structions extensions. Then, VFP (Vector Floating Point) was standardized.
One important difference from x86 is that in ARM, there is no stack, you work just with registers.
Listing 1.203: Optimizing Xcode 4.6.3 (LLVM) (ARM mode)
f
VLDR D16, =3.14
VMOV D17, R0, R1 ; load "a"
VMOV D18, R2, R3 ; load "b"
VDIV.F64 D16, D17, D16 ; a/3.14
VLDR D17, =4.1
VMUL.F64 D17, D18, D17 ; b*4.1
VADD.F64 D16, D17, D16 ; +
VMOV R0, R1, D16
BX LR
dbl_2C98 DCFD 3.14 ; DATA XREF: f
dbl_2CA0 DCFD 4.1 ; DATA XREF: f+10