450 10. The Rendering Engine
10.2.4. A Brief History of the GPU
In the early days of game development, all rendering was done on the CPU.
Games like Castle Wolfenstein 3D and Doom pushed the limits of what early
PCs could do, rendering interactive 3D scenes without any help from special-
ized graphics hardware (other than a standard VGA card).
As the popularity of these and other PC games took off , graphics hard-
ware was developed to offl oad work from the CPU. The earliest graphics ac-
celerators, like 3Dfx’s Voodoo line of cards, handled only the most expensive
stage in the pipeline—the rasterization stage. Subsequent graphics accelera-
tors provided support for the geometry processing stage as well.
At fi rst, graphics hardware provided only a hard-wired but confi gurable
implementation known as the fi xed-function pipeline. This technology was
known as hardware transformation and lighting , or hardware T&L for short. Later,
certain substages of the pipeline were made programmable. Engineers could
now write programs called shaders to control exactly how the pipeline pro-
cessed vertices (vertex shaders) and fragments (fragment shaders, more common-
ly known as pixel shaders). With the introduction of DirectX 10, a third type of
shader known as a geometry shader was added. It permits rendering engineers
to modify, cull, or create entire primitives (triangles, lines, and points).
Graphics hardware has evolved around a specialized type of micropro-
cessor known as the graphics processing unit or GPU. A GPU is designed to
maximize throughput of the pipeline, which it achieves through massive par-
allelization. For example, a modern GPU like the GeForce 8800 can process
128 vertices or fragments simultaneously.
Even in its fully programmable form, a GPU is not a general-purpose
microprocessor—nor should it be. A GPU achieves its high processing speeds
(on the order of terafl ops on today’s GPUs) by carefully controlling the fl ow of
data through the pipeline. Certain pipeline stages are either entirely fi xed in
their function, or they are confi gurable but not programmable. Memory can
only be accessed in controlled ways, and specialized data caches are used to
minimize unnecessary duplication of computations.
In the following sections, we’ll briefl y explore the architecture of a mod-
ern GPU and see how the runtime portion of the rendering pipeline is typi-
cally implemented. We’ll speak primarily about current GPU architectures,
which are used on personal computers with the latest graphics cards and on
console platforms like the Xbox 360 and the PS3. However, not all platforms
support all of the features we’ll be discussing here. For example, the Wii does
not support programmable shaders, and most PC games need to support fall-
back rendering solutions to support older graphics cards with only limited
programmable shader support.