Game Engine Architecture

(Ben Green) #1
189

// test the two functions
__m128 c = addWithAssembly(a, b);
__m128 d = addWithIntrinsics(a, b);

// store the original values back to check that they
// weren’t overwritten
_mm_store_ps(&A[0], a);
_mm_store_ps(&B[0], b);

// store results into float arrays so we can print
// them
_mm_store_ps(&C[0], c);
_mm_store_ps(&D[0], d);
// inspect the results
printf(“%g %g %g %g\n”, A[0], A[1], A[2], A[3]);
printf(“%g %g %g %g\n”, B[0], B[1], B[2], B[3]);
printf(“%g %g %g %g\n”, C[0], C[1], C[2], C[3]);
printf(“%g %g %g %g\n”, D[0], D[1], D[2], D[3]);
return 0;
}

4.7.1.4. Vector-Matrix Multiplication with SSE


Let’s take a look at how vector-matrix multiplication might be implemented
using SSE instructions. We want to multiply the 1 × 4 vector v with the 4 × 4
matrix M to generate a result vector r.


The multiplication involves taking the dot product of the row vector v
with the columns of matrix M. So to do this calculation using SSE instructions,
we might fi rst try storing v in an SSE register (__m128), and storing each of
the columns of M in SSE registers as well. Then we could calculate all of the
products vkMij in parallel using only four mulps instructions, like this:


11 12 13 14
21 22 23 24
31 32 33 34
41 42 43 44

11 12 13 14
21 22 23 24
31 32 33 34
41 42 43 44

;


[ ][ ]


((((


))))


xyzw x y z w

xxxx
yyyy
zzzz
wwww

MMMM


MMMM


rrrr v v v v MMMM

MMMM

vM vM vM vM
vM vM vM vM
vM vM vM vM
vM vM vM vM

=


⎡⎤


⎢⎥


= ⎢⎥


⎢⎥


⎢⎥


⎣⎦


⎡⎤


⎢⎥++++


=⎢⎥


⎢⎥++++


⎢⎥++++


⎣⎦


r vM

.


4.7. Hardware-Accelerated SIMD Math

Free download pdf