1.26 FSCALE replacement.
v.d=start;
v.i++;
return v.d-start;
}
void main()
{
printf ("%g\n", calculate_machine_epsilon(1.0));
};
ARM64 has no instruction that can add a number to a FPU D-register, so the input value (that came inD0)
is first copied intoGPR, incremented, copied to FPU registerD1, and then subtraction occurs.
Listing 1.360: Optimizing GCC 4.9 ARM64
calculate_machine_epsilon:
fmov x0, d0 ; load input value of double type into X0
add x0, x0, 1 ; X0++
fmov d1, x0 ; move it to FPU register
fsub d0, d1, d0 ; subtract
ret
See also this example compiled for x64 with SIMD instructions:1.31.4 on page 437.
MIPS
The new instruction here isMTC1(“Move To Coprocessor 1”), it just transfers data fromGPRto the FPU’s
registers.
Listing 1.361: Optimizing GCC 4.4.5 (IDA)
calculate_machine_epsilon:
mfc1 $v0, $f12
or $at, $zero ; NOP
addiu $v1, $v0, 1
mtc1 $v1, $f2
jr $ra
sub.s $f0, $f2, $f12 ; branch delay slot
Conclusion
It’s hard to say whether someone may need this trickery in real-world code, but as was mentioned many
times in this book, this example serves well for explaining the IEEE 754 format andunions in C/C++.
1.26 FSCALE replacement
Agner Fog in hisOptimizing subroutines in assembly language / An optimization guide for x86 platforms
work^169 states thatFSCALEFPUinstruction (calculating 2 n) may be slow on many CPUs, and he offers
faster replacement.
Here is my translation of his assembly code to C/C++:
#include <stdint.h>
#include <stdio.h>
union uint_float
{
uint32_t i;
float f;
};
float flt_2n(int N)
(^169) http://www.agner.org/optimize/optimizing_assembly.pdf