222 5. Engine Support Systems
A load-hit-store is a particularly bad kind of cache miss, prevalent on the
PowerPC architectures found in the Xbox 360 and PLAYSTATION 3, in which
the CPU writes data to a memory address and then reads the data back before
it has had a chance to make its way through the CPU’s instruction pipeline and
out into the L1 cache. See htt p://assemblyrequired.crashworks.org/2008/07/08/
load-hit-stores-and-the-_ _restrict-keyword for more details.
5.2.3.2. Instruction Cache and Data Cache
When writing high-performance code for a game engine or for any other per-
formance-critical system, it is important to realize that both data and code are
cached. The instruction cache (I-cache) is used to preload executable machine
code before it runs, while the data cache (D-cache) is used to speed up reading
and writing of data to main RAM. Most processors separate the two caches
physically. Hence it is possible for a program to slow down because of an I-
cache miss or because of a D-cache miss.
5.2.3.3. Avoiding Cache Misses
The best way to avoid D-cache misses is to organize your data in contiguous
blocks that are as small as possible and then access them sequentially. This
yields the minimum number of cache misses. When the data is contiguous
(i.e., you don’t “jump around” in memory a lot), a single cache miss will load
the maximum amount of relevant data in one go. When the data is small, it
is more likely to fi t into a single cache line (or at least a minimum number
of cache lines). And when you access your data sequentially (i.e., you don’t
“jump around” within the contiguous memory block), you achieve the mini-
mum number of cache misses, since the CPU never has to reload a cache line
from the same region of RAM.
Avoiding I-cache misses follows the same basic principle as avoiding D-
cache misses. However, the implementation requires a diff erent approach.
The compiler and linker dictate how your code is laid out in memory, so you
might think you have litt le control over I-cache misses. However, most C/C++
linkers follow some simple rules that you can leverage, once you know what
they are:
- The machine code for a single function is almost always contiguous in
memory. That is, the linker almost never splits a function up in order
to intersperse another function in the middle. (Inline functions are the
exception to this rule—more on this topic below.) - Functions are laid out in memory in the order they appear in the
translation unit’s source code (.cpp fi le).