CHAPTER 17. FLOATING-POINT UNIT CHAPTER 17. FLOATING-POINT UNIT
Optimizing GCC 4.4.1
Listing 17.13: Optimizing GCC 4.4.1
public d_max
d_max proc near
arg_0 = qword ptr 8
arg_8 = qword ptr 10h
push ebp
mov ebp, esp
fld [ebp+arg_0] ; _a
fld [ebp+arg_8] ; _b; stack state now: ST(0) = _b, ST(1) = _a
fxch st(1)
; stack state now: ST(0) = _a, ST(1) = _b
fucom st(1) ; compare _a and _b
fnstsw ax
sahf
ja short loc_8048448
; store ST(0) to ST(0) (idle operation), pop value at top of stack,
; leave _b at top
fstp st
jmp short loc_804844A
loc_8048448:
; store _a to ST(0), pop value at top of stack, leave _a at top
fstp st(1)
loc_804844A:
pop ebp
retn
d_max endp
It is almost the same except thatJAis used afterSAHF. Actually, conditional jump instructions that check “larger”, “lesser”
or “equal” for unsigned number comparison (these areJA,JAE,JB,JBE,JE/JZ,JNA,JNAE,JNB,JNBE,JNE/JNZ) check
only flagsCFandZF.
Let’s recall where bitsC3/C2/C0are located in theAHregister after the execution ofFSTSW/FNSTSW:
6 2 1 0
C3 C2C1C0Let’s also recall, how the bits fromAHare stored into the CPU flags the execution ofSAHF:
7 6 4 2 0
SFZF AF PF CFAfter the comparison, theC3andC0bits are moved intoZFandCF, so the conditional jumps are able work after. JAis
triggering if bothCFareZFzero.
Thereby, the conditional jumps instructions listed here can be used after aFNSTSW/SAHFinstruction pair.
Apparently, the FPUC3/C2/C0status bits were placed there intentionally, to easily map them to base CPU flags without
additional permutations?
GCC 4.8.1 with-O3optimization turned on
Some new FPU instructions were added in the P6 Intel family^19. These areFUCOMI(compare operands and set flags of the
main CPU) andFCMOVcc(works likeCMOVcc, but on FPU registers). Apparently, the maintainers of GCC decided to drop
support of pre-P6 Intel CPUs (early Pentiums, 80486, etc).
And also, the FPU is no longer separate unit in P6 Intel family, so now it is possible to modify/check flags of the main CPU
from the FPU.
(^19) Starting at Pentium Pro, Pentium-II, etc.
