Game Engine Architecture

186 4. 3D Math for Games

a single 128-bit register; four operations such as additions or multiplications are performed in parallel on four pairs of fl oats using a single instruction. This is just what the doctor ordered when multiplying a four-element vector by a 4 × 4 matrix!

4.7.1.1. SSE Registers In packed 32-bit fl oating-point mode, each 128-bit SSE register contains four 32-bit fl oats. The individual fl oats within an SSE register are conveniently re- ferred to as [ x y z w ], just as they would be when doing vector/matrix math in homogeneous coordinates on paper (see Figure 4.30). To see how the SSE registers work, here’s an example of a SIMD instruction: addps xmm0, xmm1 The addps instruction adds the four fl oats in the 128-bit XMM0 register with the four fl oats in the XMM1 register, and stores the four results back into XMM0. Put another way: xmm0.x = xmm0.x + xmm1.x; xmm0.y = xmm0.y + xmm1.y; xmm0.z = xmm0.z + xmm1.z; xmm0.w = xmm0.w + xmm1.w. The four fl oating-point values stored in an SSE register can be extracted to or loaded from memory or registers individually, but such operations tend to be comparatively slow. Moving data between the x87 FPU registers and the SSE registers is particularly bad, because the CPU has to wait for either the x87 or the SSE unit to spit out its pending calculations. This stalls out the CPU’s entire instruction execution pipeline and results in a lot of wasted cycles. In a nutshell, code that mixes regular float mathematics with SSE mathematics should be avoided like the plague. To minimize the costs of going back and forth between memory, x87 FPU registers, and SSE registers, most SIMD math libraries do their best to leave data in the SSE registers for as long as possible. This means that even scalar values are left in SSE registers, rather than transferring them out to float variables. For example, a dot product between two vectors produces a scalar result, but if we leave that result in an SSE register it can be used later in other

x y z w

32 bits 32 bits 32 bits 32 bits

Figure 4.30. The four components of an SSE register in 32-bit fl oating-point mode.

Game Engine Architecture

Get our desktop app

Company

Features

Documentation

Resources