220 5. Engine Support Systems
blocks to be shift ed each frame, for some small value of N like 8 or 16. If our
game is running at 30 frames per second, then each frame lasts 1/30 of a sec-
ond (33 ms). So the heap can usually be completely defragmented in less than
one second without having any noticeable eff ect on the game’s frame rate. As
long as allocations and deallocations aren’t happening at a faster rate than
the defragmentation shift s, the heap will remain mostly defragmented at all
times.
This approach is only valid when the size of each block is relatively small,
so that the time required to move a single block does not exceed the time al-
lott ed to relocation each frame. If very large blocks need to be relocated, we
can oft en break them up into two or more subblocks, each of which can be
relocated independently. This hasn’t proved to be a problem in Naughty Dog’s
engine, because relocation is only used for dynamic game objects, and they
are never larger than a few kilobytes—and usually much smaller.
5.2.3. Cache Coherency
To understand why memory access patt erns aff ect performance, we need
fi rst to understand how modern processors read and write memory. Access-
ing main system RAM is always a slow operation, oft en taking thousands of
processor cycles to complete. Contrast this with a register access on the CPU
itself, which takes on the order of tens of cycles or sometimes even a single
cycle. To reduce the average cost of reading and writing to main RAM, mod-
ern processors utilize a high-speed memory cache.
A cache is a special type of memory that can be read from and writt en to
by the CPU much more quickly than main RAM. The basic idea of memory
caching is to load a small chunk of memory into the high-speed cache when-
ever a given region of main RAM is fi rst read. Such a memory chunk is called
a cache line and is usually between 8 and 512 bytes, depending on the micro-
processor architecture. On subsequent read operations, if the requested data
already exists in the cache, it is loaded from the cache directly into the CPU’s
registers—a much faster operation than reading from main RAM. Only if the
required data is not already in the cache does main RAM have to be accessed.
This is called a cache miss. Whenever a cache miss occurs, the program is forced
to wait for the cache line to be refreshed from main RAM.
Similar rules may apply when writing data to RAM. The simplest kind
of cache is called a write-through cache ; in such a cache design, all writes to
the cache are simply mirrored to main RAM immediately. However, in a
write-back (or copy-back ) cache design, data is fi rst writt en into the cache and
the cache line is only fl ushed out to main RAM under certain circumstances,
such as when a dirty cache line needs to be evicted in order to read in a new