Custom PC – October 2019

(sharon) #1

17 years ago, and when AMD acquired
the company in 2006, that architectural
generation was still the one on the market.
RDNA is one such infrequent generational
change for the company. Each of the
improvements makes good sense in
isolation, but it’s only when we tie them
all up at the end of the explanation that
the whole theme of the design becomes
clear. Before we start, though, let’s run
through a quick refresher on shaders.
Triangles sent to the GPU are
transformed and animated to be
on the screen in the right place
according to a program that’s run
for each vertex of the triangle,
called a vertex shader. Those
triangles are processed to work
out the pixels they’ll touch on the screen, and
then those pixel samples are individually
painted according to a program that’s run for
each one, called a pixel shader. There’s a third
common type of program that operates on
general data, called the compute shader.
All those shader programs are executed
across sets of vertices or pixels in a highly
parallel nature by the GPU, which is called
SIMD (single instruction, multiple data)
execution. For each piece of data in the set,
it runs the same instruction on it in parallel.


The name for a set of either vertices or
pixels that executes in SIMD fashion (and
in practice, there are some other types
of parallel-processed data too) changes
depending on who you talk to. That makes
it difficult to unify the terminology, but
they’re all talking about the same group of
work, whether you see people referring
to it as a warp, grid, wavefront, wave,
workgroup, task, thread group and so on.
We’ll use ‘wave’ from this point onwards.

The ‘width’ of the SIMD aspect of the
machine is really important, because
shader programs can have branches in
them. Imagine a pixel shader that needs
to decide which texture to apply based on
some property of the renderer. Maybe it’s
applying a shiny material – some pixels in the
wave might take one branch, some might
take another. The way it works is the GPU
still runs that pixel shader for all pixels in the
wave, and just masks out the results from
those that didn’t take the current branch.

Imagine your GPU can run a single
64-wide wave, and that just one of the
pixels in the wave took the current branch.
The machine runs at 1/64th the potential
peak performance for that branch, which is
potentially hundreds or thousands of cycles
long. SIMD width is therefore a trade-off
for the GPU designer to make. Making it
wider is simpler in hardware, which usually
means less area and power, but it also
means there’s a performance penalty for
lightly taken branches is higher.
The most apparent change
to RDNA is in the SIMD width.
Whereas GCN was a 64-wide
machine, meaning it executed
a 64-wide wave of threads,
RDNA is a 32-wide machine at its
heart. On top of that, GCN ran each 64-wide
wave over four clock cycles on a 16-wide
SIMD unit. RDNA runs each 32-wide wave
in a single cycle on a 32-wide SIMD unit.
RDNA also changes how work is
distributed across a collection of those SIMD
units. GCN distributes work across groups
of four 16-wide SIMD units at a time in a
collective block it calls the Compute Unit
(CU). In comparison, RDNA distributes its
work across a pair of 32-wide SIMD units
in a collective block it calls the Workgroup

IT PROVIDES A VERY HEALTHY


SPEED-UP IN MODERN SHADER
CODE JUST FROM THOSE CHANGES

The new shading microarchitecture in RDNA is only AMD’s fourth completely new programmable shader core design since 2001

Free download pdf