327
The SPUs never read directly from main RAM. Instead, a direct memory
access (DMA) controller allows blocks of data to be copied back and forth
between system RAM and the SPUs’ local stores. These data transfers happen
in parallel, so both the PPU and SPUs can be doing useful calculations while
they wait for data to arrive.
7.6.2. SIMD
As we saw in Section 4.7, most modern CPUs (including the Xbox 360’s three
PowerPC processors, and the PS3’s PPU and SPUs) provide a class of instruc-
tions known as single instruction, multiple data (SIMD). Such instructions can
perform a particular operation on more than one piece of data simultaneously,
and as such they represent a fi ne-grained form of hardware parallelism. CPUs
provide a number of diff erent SIMD instruction variants, but by far the most
commonly-used in games are instructions that operate on four 32-bit fl oating-
point values in parallel, because they allow 3D vector and matrix math to be
performed four times more quickly than with their single instruction, single
data (SISD) counterparts.
Retrofi tt ing existing 3D math code to leverage SIMD instructions can be
tricky, although the task is much easier if a well-encapsulated 3D math li-
brary was used in the original code. For example, if a dot product is calcu-
lated in long hand everywhere (e.g., float d = a.x b.x + a.y b.y
- a.z * b.z;), then a very large amount of code will need to be re-writt en.
However, if dot products are calculated by calling a function (e.g., float d =
Dot(a, b);), and if vectors are treated largely as black boxes throughout the
code base, then retrofi tt ing for SIMD can be accomplished by modifying the
7.6. Multiprocessor Game Loops
Video RAM
(256 MB) GPU
System RAM
(256 MB)
...
PPU
L1
Data
L1
Instr
L2 Cache
SPU0
Local
Store
(256 kB)
SPU1
Local
Store
(256 kB)
SPU6
Local
Store
(256 kB)
DMA
DMA Bus Controller
Figure 7.5. Simplifi ed view of the PS3’s cell broadband architecture.