Collective Wisdom from the Experts 93
Complexity analysis is measured in
terms of an abstract machine, but
software runs on real machines.
Modern computer systems are orga-
nized as hierarchies of physical and
virtual machines, including language
runtimes, operating systems, CPUs,
cache memory, random-access mem-
ory, disk drives, and networks. This
table shows the limits on random
access time and storage capacity for a
typical networked server.
Note that capacity and speed vary by several orders of magnitude. Caching
and lookahead are used heavily at every level of our systems to hide this varia-
tion, but they only work when access is predictable. When cache misses are
frequent, the system will be thrashing. For example, to randomly inspect every
byte on a hard drive could take 32 years. Even to randomly inspect every byte
in RAM could take 11 minutes. Random access is not predictable. What is?
That depends on the system, but reaccessing recently used items and accessing
items sequentially are usually a win.
Algorithms and data structures vary in how effectively they use caches. For
- Linear search makes good use of lookahead, but requires O(n) comparisons.
- Binary search of a sorted array requires only O(log(n)) comparisons.
- Search of a van Emde Boas tree is O(log(n)) and cache-oblivious.
How to choose? In the last analysis, by measuring. The table below shows the
time required to search arrays of 64-bit integers via these three methods. On my
- Linear search is competitive for
small arrays, but loses exponen-
tially for larger arrays.
- van Emde Boas wins hands
down, thanks to its predictable
Search time (ns)
8 50 90 40
64 180 150 70
512 1,200 230 100
4,096 17,000 320 160
Linear Binary vEB
You pays your money and you takes your choice.
Access time Capacity
Register < 1 ns 64 b
Cache line 64 B
L1 cache 1 ns 64 KB
L2 cache 4 ns 8 MB
RAM 20 ns 32 GB
Disk 10 ms 10 TB
LAN 20 ms > 1 PB
Internet 100 ms > 1 ZB