Maximum PC - USA (2022-01)

(Maropa) #1

JAN 2022 MAXIMUMPC 11


Jarred Walton

TECH TALK

Jarred Walton has been a
PC and gaming enthusiast
for over 30 years.

©^


AM


D


Multi-Chiplet GPUs could

be the future of graphics

To date, all implementations of multi-GPU
rendering have had drawbacks that prevented ideal
scaling. The biggest hurdle is figuring out how best
to break up the task. GPUs already split workloads
up into thousands of queues of instructions running
in parallel, but splitting that across multiple GPUs
separated by a relatively slow interface limits
scaling. Consider the RTX 30 90, it has an internal
memory bandwidth of 9 36GB/s. The current
NVLink connector only provides up to 1 12GB/s—
about one-eighth the bandwidth.
Games generally opt to use alternate frame
rendering (AFR) for multi-GPU modes as every
other frame is rendered by the second GPU. But
AFR can introduce latency and create a higher CPU
load, which limits scaling. Nvidia has abandoned
support for 3 -way and 4 -way SLI for this reason.
But what if there were a better way to scale?
AMD’s Instinct MI250X GPUs and Intel’s Xe-
HPC solutions are tackling the problem, but both
are intended for data center workloads rather
than real-time game rendering. However, there’s
enough communication between the GPUs that
they have much higher bandwidth interfaces, plus
they stack multiple chips to boost performance.
Look at what AMD did with its Ryzen 3000 and
Ryzen 5000 CPUs. AMD created the Infinity Fabric
as a high-speed link between the CPU chiplet and

USING MULTIPLE GPUS—or any processor—to accelerate a task by breaking

it up into smaller, independent pieces isn’t exactly a new idea. 3 dfx did

that with its Scan Line Interleave (SLI) technology back in the late 199 0s,

and Nvidia’s SLI (Scalable Link Interface) and AMD’s CrossFire have been

doing it since 2004 and 200 5, respectively.

the main chipset, and on higher
core-count models there can be
anywhere from two to eight CPU
chiplets. Instead of separate chips
linked by a bus, MI200 GPUs will
stack the chips on an elevated
fanout bridge with an ultra-high
bandwidth die interconnect. Intel’s
Xe-HPC (Ponte Vecchio) stacks
47 chiplets, linked by a high-
speed EMIB (embedded multi-die
interconnect bridge). Either way,
the goal is the same: increase
chip-to-chip bandwidth and enable
better scaling across GPUs.
It’s like the next evolution of
HBM (high-bandwidth memory),
where multiple stacks are placed
on a silicon interposer that links
with the main GPU to provide gobs
of memory bandwidth. Except now,
instead of memory, Intel and AMD
are stacking processors, cache,
and whatever else might be useful.

The specs of AMD’s MI250X
look impressive. Each GPU chiplet
has 110 compute units (CUs),
440 matrix cores, and 6 4GB of
HBM2e memory. Combined, the
GPUs consist of 5 8bn transistors,
128 GB of HBM2e with 3 .2TB/s
of bandwidth, and 220 CUs that
can deliver 48 TFLOPS of FP
compute—double that for FP
work done on the matrix cores.
On paper, it’s nearly five times the
compute of Nvidia’s A 100 solution
and will be deployed in the Frontier
and El Capitan supercomputers.
When will we see the consumer
equivalent? We don’t need 128 GB
of HBM2e or all that FP64 number
crunching, just 120 CUs focused
on FP32 graphics performance,
linked by a similar high-speed
interconnect. Even a single 120 CU
AMD GPU made on 5 nm technology
could be up to 50 percent faster
than the current Radeon RX 6900
XT. Imagine linking two or four of
those GPUs together in a single
monster gaming solution. That’s
a while off, but I look forward to
testing the Radeon RX 8950 XT
dual-chiplet card in a few years.
Fingers crossed.
All weneed is 120 CUsofFP
graphics performance, linked
by a high-speed interconnect.

AMD’s Instinct MI 250 X data center
GPU uses stacked chiplets.
Free download pdf