Assembly Language for Beginners

1.26 FSCALE replacement.

v.d=start;
v.i++;
return v.d-start;
}

void main()
{
printf ("%g\n", calculate_machine_epsilon(1.0));
};

ARM64 has no instruction that can add a number to a FPU D-register, so the input value (that came inD0)
is first copied intoGPR, incremented, copied to FPU registerD1, and then subtraction occurs.

Listing 1.360: Optimizing GCC 4.9 ARM64

calculate_machine_epsilon:
fmov x0, d0 ; load input value of double type into X0
add x0, x0, 1 ; X0++
fmov d1, x0 ; move it to FPU register
fsub d0, d1, d0 ; subtract
ret

See also this example compiled for x64 with SIMD instructions:1.31.4 on page 437.

MIPS

The new instruction here isMTC1(“Move To Coprocessor 1”), it just transfers data fromGPRto the FPU’s
registers.

Listing 1.361: Optimizing GCC 4.4.5 (IDA)

calculate_machine_epsilon:
mfc1 $v0, $f12
or $at, $zero ; NOP
addiu $v1, $v0, 1
mtc1 $v1, $f2
jr $ra
sub.s $f0, $f2, $f12 ; branch delay slot

Conclusion

It’s hard to say whether someone may need this trickery in real-world code, but as was mentioned many
times in this book, this example serves well for explaining the IEEE 754 format andunions in C/C++.

1.26 FSCALE replacement

Agner Fog in hisOptimizing subroutines in assembly language / An optimization guide for x86 platforms
work^169 states thatFSCALEFPUinstruction (calculating 2 n) may be slow on many CPUs, and he offers
faster replacement.

Here is my translation of his assembly code to C/C++:

#include <stdint.h>
#include <stdio.h>

union uint_float
{
uint32_t i;
float f;
};

float flt_2n(int N)

(^169) http://www.agner.org/optimize/optimizing_assembly.pdf

Assembly Language for Beginners

1.26 FSCALE replacement.

Get our desktop app

Company

Features

Documentation

Resources