CHAPTER 27. WORKING WITH FLOATING POINT NUMBERS USING SIMD CHAPTER 27. WORKING WITH FLOATING POINT NUMBERS USING SIMD
The constants are encoded by compiler in IEEE 754 format.
MULSDandADDSDwork just as the same, but do multiplication and addition.
The result of the function’s execution in typedoubleis left in the inXMM0register.
That is how non-optimizing MSVC works:
Listing 27.2: MSVC 2012 x64
real@4010666666666666 DQ 04010666666666666r ; 4.1
real@40091eb851eb851f DQ 040091eb851eb851fr ; 3.14
a$ = 8
b$ = 16
f PROC
movsdx QWORD PTR [rsp+16], xmm1
movsdx QWORD PTR [rsp+8], xmm0
movsdx xmm0, QWORD PTR a$[rsp]
divsd xmm0, QWORD PTR real@40091eb851eb851f
movsdx xmm1, QWORD PTR b$[rsp]
mulsd xmm1, QWORD PTR real@4010666666666666
addsd xmm0, xmm1
ret 0
f ENDP
Slightly redundant. The input arguments are saved in the “shadow space” (8.2.1 on page 91), but only their lower register
halves, i.e., only 64-bit values of typedouble. GCC produces the same code.
27.1.2 x86
Let’s also compile this example for x86. Despite the fact it’s generating for x86, MSVC 2012 uses SSE2 instructions:
Listing 27.3: Non-optimizing MSVC 2012 x86
tv70 = -8 ; size = 8
_a$ = 8 ; size = 8
_b$ = 16 ; size = 8
_f PROC
push ebp
mov ebp, esp
sub esp, 8
movsd xmm0, QWORD PTR _a$[ebp]
divsd xmm0, QWORD PTR __real@40091eb851eb851f
movsd xmm1, QWORD PTR _b$[ebp]
mulsd xmm1, QWORD PTR __real@4010666666666666
addsd xmm0, xmm1
movsd QWORD PTR tv70[ebp], xmm0
fld QWORD PTR tv70[ebp]
mov esp, ebp
pop ebp
ret 0
_f ENDP
Listing 27.4: Optimizing MSVC 2012 x86
tv67 = 8 ; size = 8
_a$ = 8 ; size = 8
_b$ = 16 ; size = 8
_f PROC
movsd xmm1, QWORD PTR _a$[esp-4]
divsd xmm1, QWORD PTR __real@40091eb851eb851f
movsd xmm0, QWORD PTR _b$[esp-4]
mulsd xmm0, QWORD PTR __real@4010666666666666
addsd xmm1, xmm0
movsd QWORD PTR tv67[esp-4], xmm1
fld QWORD PTR tv67[esp-4]
ret 0
_f ENDP