Assembly Language for Beginners

1.31. WORKING WITH FLOATING POINT NUMBERS USING SIMD Listing 1.404: Optimizing MSVC 2012 x64 v$ = 8 calculate_machine_epsilon PROC movsdx QWORD PTR v$[rsp], xmm0 movaps xmm1, xmm0 inc QWORD PTR v$[rsp] movsdx xmm0, QWORD PTR v$[rsp] subsd xmm0, xmm1 ret 0 calculate_machine_epsilon ENDP

There is no way to add 1 to a value in 128-bit XMM register, so it must be placed into memory.

There is, however, theADDSDinstruction (Add Scalar Double-Precision Floating-Point Values) which can add a value to the lowest 64-bit half of a XMM register while ignoring the higher one, but MSVC 2012 probably is not that good yet^193.

Nevertheless, the value is then reloaded to a XMM register and subtraction occurs. SUBSDis “Subtract Scalar Double-Precision Floating-Point Values”, i.e., it operates on the lower 64-bit part of 128-bit XMM register. The result is returned in the XMM0 register.

1.31.5 Pseudo-random number generator example revisited

Let’s revisit “pseudo-random number generator example” example listing.1.25.1.

If we compile this in MSVC 2012, it will use the SIMD instructions for the FPU.

Listing 1.405: Optimizing MSVC 2012 __real@3f800000 DD 03f800000r ; 1

tv128 = -4 _tmp$ = -4 ?float_rand@@YAMXZ PROC push ecx call ?my_rand@@YAIXZ ; EAX=pseudorandom value and eax, 8388607 ; 007fffffH or eax, 1065353216 ; 3f800000H ; EAX=pseudorandom value & 0x007fffff | 0x3f800000 ; store it into local stack: mov DWORD PTR _tmp$[esp+4], eax ; reload it as float point number: movss xmm0, DWORD PTR _tmp$[esp+4] ; subtract 1.0: subss xmm0, DWORD PTR __real@3f800000 ; move value to ST0 by placing it in temporary variable... movss DWORD PTR tv128[esp+4], xmm0 ; ... and reloading it into ST0: fld DWORD PTR tv128[esp+4] pop ecx ret 0 ?float_rand@@YAMXZ ENDP

All instructions have the -SS suffix, which stands for “Scalar Single”.

“Scalar” implies that only one value is stored in the register.

“Single”^194 stands forfloatdata type.

1.31.6 Summary.

Only the lower half of XMM registers is used in all examples here, to store number in IEEE 754 format.

(^193) As an exercise, you may try to rework this code to eliminate the usage of the local stack.
(^194) I.e., single precision.

Assembly Language for Beginners

1.31.5 Pseudo-random number generator example revisited

1.31.6 Summary.

Get our desktop app

Company

Features

Documentation

Resources