1.31. WORKING WITH FLOATING POINT NUMBERS USING SIMD
Listing 1.404: Optimizing MSVC 2012 x64
v$ = 8
calculate_machine_epsilon PROC
movsdx QWORD PTR v$[rsp], xmm0
movaps xmm1, xmm0
inc QWORD PTR v$[rsp]
movsdx xmm0, QWORD PTR v$[rsp]
subsd xmm0, xmm1
ret 0
calculate_machine_epsilon ENDP
There is no way to add 1 to a value in 128-bit XMM register, so it must be placed into memory.
There is, however, theADDSDinstruction (Add Scalar Double-Precision Floating-Point Values) which can
add a value to the lowest 64-bit half of a XMM register while ignoring the higher one, but MSVC 2012
probably is not that good yet^193.
Nevertheless, the value is then reloaded to a XMM register and subtraction occurs. SUBSDis “Subtract
Scalar Double-Precision Floating-Point Values”, i.e., it operates on the lower 64-bit part of 128-bit XMM
register. The result is returned in the XMM0 register.
1.31.5 Pseudo-random number generator example revisited
Let’s revisit “pseudo-random number generator example” example listing.1.25.1.
If we compile this in MSVC 2012, it will use the SIMD instructions for the FPU.
Listing 1.405: Optimizing MSVC 2012
__real@3f800000 DD 03f800000r ; 1
tv128 = -4
_tmp$ = -4
?float_rand@@YAMXZ PROC
push ecx
call ?my_rand@@YAIXZ
; EAX=pseudorandom value
and eax, 8388607 ; 007fffffH
or eax, 1065353216 ; 3f800000H
; EAX=pseudorandom value & 0x007fffff | 0x3f800000
; store it into local stack:
mov DWORD PTR _tmp$[esp+4], eax
; reload it as float point number:
movss xmm0, DWORD PTR _tmp$[esp+4]
; subtract 1.0:
subss xmm0, DWORD PTR __real@3f800000
; move value to ST0 by placing it in temporary variable...
movss DWORD PTR tv128[esp+4], xmm0
; ... and reloading it into ST0:
fld DWORD PTR tv128[esp+4]
pop ecx
ret 0
?float_rand@@YAMXZ ENDP
All instructions have the -SS suffix, which stands for “Scalar Single”.
“Scalar” implies that only one value is stored in the register.
“Single”^194 stands forfloatdata type.
1.31.6 Summary.
Only the lower half of XMM registers is used in all examples here, to store number in IEEE 754 format.
(^193) As an exercise, you may try to rework this code to eliminate the usage of the local stack.
(^194) I.e., single precision.