328 7. The Game Loop and Real-Time Simulation
3D math library, without having to modify much if any of the calling code
(except perhaps to ensure alignment of vector data to 16-byte boundaries).
7.6.3. Fork and Join
Another way to utilize multicore or multiprocessor hardware is to adapt di-
vide-and-conquer algorithms for parallelism. This is oft en called the fork/join
approach. The basic idea is to divide a unit of work into smaller subunits, dis-
tribute these workloads onto multiple processing cores or hardware threads
(fork), and then merge the results once all workloads have been completed
(join). When applied to the game loop, the fork/join architecture results in a
main loop that looks very similar to its single-threaded counterpart, but with
some of the major phases of the update loop being parallelized. This architec-
ture is illustrated in Figure 7.6.
Let’s take a look at a concrete example. Blending animations using linear
interpolation (LERP) is an operation that can be done on each joint indepen-
dently of all other joints within a skeleton (see Section 11.5.2.2). We’ll assume
that we want to blend pairs of skeletal poses for fi ve characters, each of which
has 100 joints, meaning that we need to process 500 pairs of joint poses.
To parallelize this task, we can divide the work into N batches, each con-
taining roughly 500/N joint-pose pairs, where N is selected based on the avail-
Main
Thread
HID
Update Game
Objects
Ragdoll Physics
Post Animation
Game Object Update
Fork
Join
Fork
Join
etc.
Pose
Blending
Pose
Blending
Pose
Blending
Simulate/
Integrate
Simulate/
Integrate
Simulate/
Integrate
Figure 7.6. Fork and join used to parallelize selected CPU-intensive phases of the game loop.