Assembly Language for Beginners

(nextflipdebug2) #1

1.26 FSCALE replacement.


v.d=start;
v.i++;
return v.d-start;
}


void main()
{
printf ("%g\n", calculate_machine_epsilon(1.0));
};


ARM64 has no instruction that can add a number to a FPU D-register, so the input value (that came inD0)
is first copied intoGPR, incremented, copied to FPU registerD1, and then subtraction occurs.


Listing 1.360: Optimizing GCC 4.9 ARM64

calculate_machine_epsilon:
fmov x0, d0 ; load input value of double type into X0
add x0, x0, 1 ; X0++
fmov d1, x0 ; move it to FPU register
fsub d0, d1, d0 ; subtract
ret


See also this example compiled for x64 with SIMD instructions:1.31.4 on page 437.


MIPS


The new instruction here isMTC1(“Move To Coprocessor 1”), it just transfers data fromGPRto the FPU’s
registers.


Listing 1.361: Optimizing GCC 4.4.5 (IDA)

calculate_machine_epsilon:
mfc1 $v0, $f12
or $at, $zero ; NOP
addiu $v1, $v0, 1
mtc1 $v1, $f2
jr $ra
sub.s $f0, $f2, $f12 ; branch delay slot


Conclusion


It’s hard to say whether someone may need this trickery in real-world code, but as was mentioned many
times in this book, this example serves well for explaining the IEEE 754 format andunions in C/C++.


1.26 FSCALE replacement


Agner Fog in hisOptimizing subroutines in assembly language / An optimization guide for x86 platforms
work^169 states thatFSCALEFPUinstruction (calculating 2 n) may be slow on many CPUs, and he offers
faster replacement.


Here is my translation of his assembly code to C/C++:


#include <stdint.h>
#include <stdio.h>


union uint_float
{
uint32_t i;
float f;
};


float flt_2n(int N)


(^169) http://www.agner.org/optimize/optimizing_assembly.pdf

Free download pdf