Custom PC – October 2019

(sharon) #1
PCI-E 4 AND GDDR6
I/O is usually the last bit of a GPU design
that anyone discusses, but let’s make it
the first stop on our tour of what makes
RDNA tick. Talking to the outside world
is fraught with danger as far as a GPU is
concerned because of the latency of doing
so. Because a GPU tends to want to work
on large chunks of data that need to be fed
into and out of the chip as fast as possible,
bandwidth is prioritised over latency, which
is anathema to how a CPU wants to work.
That also feeds into thinking about GPU
parallelism, because high bandwidth tends
to mean high latency, for both accesses to
the DRAM memory directly connected to the
GPU, and accesses over the PCI-E connection

to the host CPU. The GPU needs to be able
to keep working while those accesses
return the data it needs, an ability that GPU
designers call latency tolerance. The tolerance
is achieved by making the GPU wider and
therefore able to do more work in parallel.
AMD chose GDDR6 as the directly
connected memory on Navi10, the first
implementation of RDNA, and for very

reasons compared with the HBM2
moryused on Vega10 and Vega20,
h powered the RX Vega and Radeon VII
-endproducts – cost. HBM2 achieves
credibly high bandwidth with a very
parallel connection to the GPU. The
number of parallel connections,
anindividual wire, means HBM2
moryneeds to be physically very close
e device to which it’s connected.
is setup results in special packaging that
aninterposer — a special silicon chip that
asthe connective substrate between
2 memory and whatever it’s connected
heHBM2 stack and interposer are both
nsive to make, due to the complexities
anufacturing and packaging the HBM2
k, and the sheer size of the interposer.
they have to be physically bonded to
other, along with the GPU, and tested.
extra packaging and testing work adds
reds of dollars to the cost of any HBM2
product,and it’s why, despite its technical
brilliance,you don’t see it used everywhere
that high performance memory is required.
In order to bring down costs to sensible
levels, AMD turned instead to a 256-bit-
wide GDDR6 interface for Navi10. GDDR6
provides bandwidth in the same ballpark as
HBM2 — almost 450GB/sec in a Radeon RX
5700 XT across its 256-bit bus — without
the extra packaging and test costs.
The PCB for the product becomes larger,
because the GDDR6 memory chips can’t live
as close to the GPU as with HBM2, but the cost
savings for AMD are substantial and allow
the company to sell the Radeon RX 5700 XT
for just £380 inc VAT and still make a healthy
profit. Fun fact: AMD not only uses GDDR6,
but also contributed significantly to inventing

it; AMD Product CTO, Joe Macri, is a founding
father of the JEDEC standards process that
created, fostered and popularised graphics-
specific GDDR memory, as well as HBM.
As for getting data into those GDDR6
memory chips, the CPU sends it across a
PCI-E 4 bus on Navi10, and you can expect
AMD to use that brand-new version of PCI-E
for every other standalone consumer graphics

AMD HAS ACCELERATED PAST INTEL ON THE
PROCESSOR FRONT AND IS AGGRESSIVELY

RENASCENT WHEN IT COMES TO GRAPHICS


good
mem
which
high-
itsinc
wide
highn
each
mem
tothe
Th
uses
actsa
HBM
to.Th
expe
of ma
stack
Then
each
That
hund
produ

KNOW YOUR INITIALS


ALU: Arithmetic logic unit

GCN: Graphics Core
Next (AMD’s GPU
microarchitecture
from 2012 to 2019)
CU: Compute unit

FMA: Fused multiply add

HBM: High-bandwidth memory

LDS: Local data share

RDNA: AMD’s new GPU
microarchitecture,
introduced in 2019
SFU: Special function unit

SIMD: Single input, multiple data

VGPR: Vector general
purpose registers
WGP: Workgroup

The move from HBM to GDDR6 requires
a larger PCB, but enables AMD to
make substantial cost savings

Free download pdf