Game Engine Architecture

189

// test the two functions __m128 c = addWithAssembly(a, b); __m128 d = addWithIntrinsics(a, b);

// store the original values back to check that they // weren’t overwritten _mm_store_ps(&A[0], a); _mm_store_ps(&B[0], b);

// store results into float arrays so we can print // them _mm_store_ps(&C[0], c); _mm_store_ps(&D[0], d); // inspect the results printf(“%g %g %g %g\n”, A[0], A[1], A[2], A[3]); printf(“%g %g %g %g\n”, B[0], B[1], B[2], B[3]); printf(“%g %g %g %g\n”, C[0], C[1], C[2], C[3]); printf(“%g %g %g %g\n”, D[0], D[1], D[2], D[3]); return 0; }

4.7.1.4. Vector-Matrix Multiplication with SSE

Let’s take a look at how vector-matrix multiplication might be implemented
using SSE instructions. We want to multiply the 1 × 4 vector v with the 4 × 4
matrix M to generate a result vector r.

The multiplication involves taking the dot product of the row vector v
with the columns of matrix M. So to do this calculation using SSE instructions,
we might fi rst try storing v in an SSE register (__m128), and storing each of
the columns of M in SSE registers as well. Then we could calculate all of the
products vkMij in parallel using only four mulps instructions, like this:

11 12 13 14 21 22 23 24 31 32 33 34 41 42 43 44

;

[ ][ ]

((((

))))

xyzw x y z w

xxxx yyyy zzzz wwww

MMMM

rrrr v v v v MMMM

MMMM

vM vM vM vM vM vM vM vM vM vM vM vM vM vM vM vM

=

⎡⎤

⎢⎥

= ⎢⎥

⎢⎥

⎣⎦

⎡⎤

⎢⎥++++

=⎢⎥

⎢⎥++++

⎣⎦

r vM

.

4.7. Hardware-Accelerated SIMD Math

Game Engine Architecture

;

[ ][ ]

((((

))))

MMMM

MMMM

=

⎡⎤

⎢⎥

= ⎢⎥

⎢⎥

⎢⎥

⎣⎦

⎡⎤

⎢⎥++++

=⎢⎥

⎢⎥++++

⎢⎥++++

⎣⎦

.

Get our desktop app

Company

Features

Documentation

Resources