1.31 Working with floating point numbers using SIMD.
clauseallocator.(h|cpp)files), which allows to have access to allocated memory using 32-bit identifiers
instead of 64-bit pointers.
1.31 Working with floating point numbers using SIMD
Of course, theFPUhas remained in x86-compatible processors when theSIMDextensions were added.
TheSIMDextensions (SSE2) offer an easier way to work with floating-point numbers.
The number format remains the same (IEEE 754).
So, modern compilers (including those generating for x86-64) usually useSIMDinstructions instead of
FPU ones.
It can be said that it’s good news, because it’s easier to work with them.
We are going to reuse the examples from the FPU section here:1.19 on page 218.
1.31.1 Simple example
#include <stdio.h>
double f (double a, double b)
{
return a/3.14 + b*4.1;
};
int main()
{
printf ("%f\n", f(1.2, 3.4));
};
x64
Listing 1.394: Optimizing MSVC 2012 x64
real@4010666666666666 DQ 04010666666666666r ; 4.1
real@40091eb851eb851f DQ 040091eb851eb851fr ; 3.14
a$ = 8
b$ = 16
f PROC
divsd xmm0, QWORD PTR real@40091eb851eb851f
mulsd xmm1, QWORD PTR real@4010666666666666
addsd xmm0, xmm1
ret 0
f ENDP
The input floating point values are passed in theXMM0-XMM3registers, all the rest—via the stack^191.
ais passed inXMM0,b—viaXMM1.
The XMM-registers are 128-bit (as we know from the section aboutSIMD:1.29 on page 406), but the
doublevalues are 64 bit, so only lower register half is used.
DIVSDis an SSE-instruction that stands for “Divide Scalar Double-Precision Floating-Point Values”, it just
divides one value of typedoubleby another, stored in the lower halves of operands.
The constants are encoded by compiler in IEEE 754 format.
MULSDandADDSDwork just as the same, but do multiplication and addition.
The result of the function’s execution in typedoubleis left in the inXMM0register.
That is how non-optimizing MSVC works:
(^191) MSDN: Parameter Passing