CHAPTER 17. FLOATING-POINT UNIT CHAPTER 17. FLOATING-POINT UNIT
Non-optimizing GCC (Linaro) 4.9
d_max:
; save input arguments in "Register Save Area"
sub sp, sp, #16
str d0, [sp,8]
str d1, [sp]
; reload values
ldr x1, [sp,8]
ldr x0, [sp]
fmov d0, x1
fmov d1, x0
; D0 - a, D1 - b
fcmpe d0, d1
ble .L76
; a>b; load D0 (a) into X0
ldr x0, [sp,8]
b .L74
.L76:
; a<=b; load D1 (b) into X0
ldr x0, [sp]
.L74:
; result in X0
fmov d0, x0
; result in D0
add sp, sp, 16
ret
Non-optimizing GCC is more verbose. First, the function saves its input argument values in the local stack (Register Save
Area). Then the code reloads these values into registersX0/X1and finally copies them toD0/D1to be compared using
FCMPE. A lot of redundant code, but that is how non-optimizing compilers work.FCMPEcompares the values and sets the
APSRflags. At this moment, the compiler is not thinking yet about the more convenientFCSELinstruction, so it proceed
using old methods: using theBLEinstruction (Branch if Less than or Equal). In the first case (a>b), the value ofagets loaded
intoX0. In the other case (a<=b), the value ofbgets loaded intoX0. Finally, the value fromX0gets copied intoD0, because
the return value needs to be in this register.
Exercise
As an exercise, you can try optimizing this piece of code manually by removing redundant instructions and not introducing
new ones (includingFCSEL).
Optimizing GCC (Linaro) 4.9—float
Let’s also rewrite this example to usefloatinstead ofdouble.
float f_max (float a, float b)
{
if (a>b)
return a;
return b;
};
f_max:
; S0 - a, S1 - b
fcmpe s0, s1
fcsel s0, s0, s1, gt
; now result in S0
ret
It is the same code, but the S-registers are used instead of D- ones. It’s because numbers of typefloatare passed in 32-bit
S-registers (which are in fact the lower parts of the 64-bit D-registers).