Game Engine Architecture

(Ben Green) #1

188 4. 3D Math for Games


__m128 addWithAssembly(
__m128 a,
__m128 b)
{
__m128 r;
__asm
{
movaps xmm0,
xmmword ptr [a]
movaps xmm1,
xmmword ptr [b]
addps xmm0, xmm1
movaps xmmword ptr [r],
xmm0
}
return r;
}

__m128 addWithIntrinsics(
__m128 a,
__m128 b)
{
__m128 r =
_mm_add_ps(a, b);
return r;
}

In the assembly language version, we have to use the __asm keyword to
invoke inline assembly instructions, and we must create the linkage between
the input parameters a and b and the SSE registers xmm0 and xmm1 manually,
via movaps instructions. On the other hand, the version using intrinsics is
much more intuitive and clear, and the code is smaller. There’s no inline as-
sembly, and the SSE instruction looks just like a regular function call.
If you’d like to experiment with these example functions, they can be in-
voked via the following test bed main() function. Notice the use of another
intrinsic, _mm_load_ps(), which loads values from an in-memory array of
floats into an __m128 variable (i.e., into an SSE register). Also notice that
we are forcing our four global float arrays to be 16-byte aligned via the
__declspec(align(16)) directive—if we omit these directives, the pro-
gram will crash.

#include <xmmintrin.h>

// ... function definitions from above ...
__declspec(align(16)) float A[]={2.0f,-1.0f,3.0f,4.0f};
__declspec(align(16)) float B[]={-1.0f,3.0f,4.0f,2.0f};
__declspec(align(16)) float C[]={0.0f,0.0f,0.0f,0.0f};
__declspec(align(16)) float D[]={0.0f,0.0f,0.0f,0.0f};
int main(int argc, char* argv[])
{
// load a and b from floating-point data arrays above
__m128 a = _mm_load_ps(&A[0]);
__m128 b = _mm_load_ps(&B[0]);
Free download pdf