Game Engine Architecture

190 4. 3D Math for Games

__m128 mulVectorMatrixAttempt1(__m128 v, __m128 Mcol1, __m128 Mcol2, __m128 Mcol3, __m128 Mcol4) { __m128 vMcol1 = _mm_mul_ps(v, Mcol1); __m128 vMcol2 = _mm_mul_ps(v, Mcol2); __m128 vMcol3 = _mm_mul_ps(v, Mcol3); __m128 vMcol4 = _mm_mul_ps(v, Mcol4); // ... then what? }

The above code would yield the following intermediate results: vMcol1 = [ vxM 11 vyM 21 vzM 31 vwM 41 ]; vMcol2 = [ vxM 12 vyM 22 vzM 32 vwM 42 ]; vMcol3 = [ vxM 13 vyM 23 vzM 33 vwM 43 ]; vMcol4 = [ vxM 14 vyM 24 vzM 34 vwM 44 ]. But the problem with doing it this way is that we now have to add “across the registers” in order to generate the results we need. For example, rx = (vxM 11 + vyM 21 + vzM 31 + vwM 41 ), so we’d need to add the four components of vMcol1 together. Adding across a register like this is diffi cult and ineffi cient, and moreover it leaves the four components of the result in four separate SSE registers, which would need to be combined into the single result vector r. We can do bett er. The “trick” here is to multiply with the rows of M, not its columns. That way, we’ll have results that we can add in parallel, and the fi nal sums will end up in the four components of a single SSE register representing the output vector r. However, we don’t want to multiply v as-is with the rows of M—we want to multiply vx with all of row 1, vy with all of row 2, vz with all of row 3, and vw with all of row 4. To do this, we need to replicate a single component of v, such as vx, across a register to yield a vector like [ vx vx vx vx ]. Then we can multiply the replicated component vectors by the appropriate rows of M. Thankfully there’s a powerful SSE instruction which can replicate values like this. It is called shufps, and it’s wrapped by the intrinsic _mm_shuffle_ ps(). This beast is a bit complicated to understand, because it’s a general- purpose instruction that can shuffl e the components of an SSE register around in arbitrary ways. However, for our purposes we need only know that the following macros replicate the x, y, z or w components of a vector across an entire register: #define SHUFFLE_PARAM(x, y, z, w) \ ((x) | ((y) << 2) | ((z) << 4) | ((w) << 6))

Game Engine Architecture

Get our desktop app

Company

Features

Documentation

Resources