Custom PC – October 2019

17 years ago, and when AMD acquired
the company in 2006, that architectural
generation was still the one on the market.
RDNA is one such infrequent generational
change for the company. Each of the
improvements makes good sense in
isolation, but it’s only when we tie them
all up at the end of the explanation that
the whole theme of the design becomes
clear. Before we start, though, let’s run
through a quick refresher on shaders.
Triangles sent to the GPU are
transformed and animated to be
on the screen in the right place
according to a program that’s run
for each vertex of the triangle,
called a vertex shader. Those
triangles are processed to work
out the pixels they’ll touch on the screen, and
then those pixel samples are individually
painted according to a program that’s run for
each one, called a pixel shader. There’s a third
common type of program that operates on
general data, called the compute shader.
All those shader programs are executed
across sets of vertices or pixels in a highly
parallel nature by the GPU, which is called
SIMD (single instruction, multiple data)
execution. For each piece of data in the set,
it runs the same instruction on it in parallel.

The name for a set of either vertices or pixels that executes in SIMD fashion (and in practice, there are some other types of parallel-processed data too) changes depending on who you talk to. That makes it difficult to unify the terminology, but they’re all talking about the same group of work, whether you see people referring to it as a warp, grid, wavefront, wave, workgroup, task, thread group and so on. We’ll use ‘wave’ from this point onwards.

The ‘width’ of the SIMD aspect of the machine is really important, because shader programs can have branches in them. Imagine a pixel shader that needs to decide which texture to apply based on some property of the renderer. Maybe it’s applying a shiny material – some pixels in the wave might take one branch, some might take another. The way it works is the GPU still runs that pixel shader for all pixels in the wave, and just masks out the results from those that didn’t take the current branch.

Imagine your GPU can run a single 64-wide wave, and that just one of the pixels in the wave took the current branch. The machine runs at 1/64th the potential peak performance for that branch, which is potentially hundreds or thousands of cycles long. SIMD width is therefore a trade-off for the GPU designer to make. Making it wider is simpler in hardware, which usually means less area and power, but it also means there’s a performance penalty for lightly taken branches is higher. The most apparent change to RDNA is in the SIMD width. Whereas GCN was a 64-wide machine, meaning it executed a 64-wide wave of threads, RDNA is a 32-wide machine at its heart. On top of that, GCN ran each 64-wide wave over four clock cycles on a 16-wide SIMD unit. RDNA runs each 32-wide wave in a single cycle on a 32-wide SIMD unit. RDNA also changes how work is distributed across a collection of those SIMD units. GCN distributes work across groups of four 16-wide SIMD units at a time in a collective block it calls the Compute Unit (CU). In comparison, RDNA distributes its work across a pair of 32-wide SIMD units in a collective block it calls the Workgroup

IT PROVIDES A VERY HEALTHY

SPEED-UP IN MODERN SHADER CODE JUST FROM THOSE CHANGES

The new shading microarchitecture in RDNA is only AMD’s fourth completely new programmable shader core design since 2001

Custom PC – October 2019

Get our desktop app

Company

Features

Documentation

Resources