221
cache line from main RAM, or when the program explicitly requests a fl ush to
occur.
Obviously cache misses cannot be totally avoided, since data has to move
to and from main RAM eventually. However, the trick to high-performance
computing is to arrange your data in RAM and code your algorithms in such
a way that the minimum number of cache misses occur. We’ll see exactly how
to accomplish this below.
5.2.3.1. Level 1 and Level 2 Caches
When caching techniques were fi rst developed, the cache memory was locat-
ed on the motherboard, constructed from a faster and more expensive type
of memory module than main RAM in order to give it the required boost in
speed. However, cache memory was expensive, so the cache size was usually
quite small—on the order of 16 kB. As caching techniques evolved, an even
faster type of cache memory was developed that was located on the CPU die
itself. This gave rise to two distinct types of cache memory: an on-die level 1
(L1) cache and an on-motherboard level 2 (L2) cache. More recently, the L2
cache has also migrated onto the CPU die (see Figure 5.7).
The rules for moving data back and forth between main RAM are of course
complicated by the presence of a level 2 cache. Now, instead of data hopping
from RAM to cache to CPU and back again, it must make two hops—fi rst from
main RAM to the L2 cache, and then from L2 cache to L1 cache. We won’t go
into the specifi cs of these rules here. (They diff er slightly from CPU to CPU
anyway.) But suffi ce it to say that RAM is slower than L2 cache memory, and
L2 cache is slower than L1 cache. Hence L2 cache misses are usually more
expensive than L1 cache misses, all other things being equal.
5.2. Memory Management
CPU Die
CPU fast CacheL1 slower CacheL2 slowest Main RAM
Figure 5.7. Level 1 and level 2 caches.