Maximum PC - USA (2022-01)

JAN 2022 MAXIMUMPC 11

Jarred Walton

TECH TALK

Jarred Walton has been a PC and gaming enthusiast for over 30 years.

©^

AM

D

Multi-Chiplet GPUs could

be the future of graphics

To date, all implementations of multi-GPU rendering have had drawbacks that prevented ideal scaling. The biggest hurdle is figuring out how best to break up the task. GPUs already split workloads up into thousands of queues of instructions running in parallel, but splitting that across multiple GPUs separated by a relatively slow interface limits scaling. Consider the RTX 30 90, it has an internal memory bandwidth of 9 36GB/s. The current NVLink connector only provides up to 1 12GB/s— about one-eighth the bandwidth. Games generally opt to use alternate frame rendering (AFR) for multi-GPU modes as every other frame is rendered by the second GPU. But AFR can introduce latency and create a higher CPU load, which limits scaling. Nvidia has abandoned support for 3 -way and 4 -way SLI for this reason. But what if there were a better way to scale? AMD’s Instinct MI250X GPUs and Intel’s Xe- HPC solutions are tackling the problem, but both are intended for data center workloads rather than real-time game rendering. However, there’s enough communication between the GPUs that they have much higher bandwidth interfaces, plus they stack multiple chips to boost performance. Look at what AMD did with its Ryzen 3000 and Ryzen 5000 CPUs. AMD created the Infinity Fabric as a high-speed link between the CPU chiplet and

USING MULTIPLE GPUS—or any processor—to accelerate a task by breaking

it up into smaller, independent pieces isn’t exactly a new idea. 3 dfx did

that with its Scan Line Interleave (SLI) technology back in the late 199 0s,

and Nvidia’s SLI (Scalable Link Interface) and AMD’s CrossFire have been

doing it since 2004 and 200 5, respectively.

the main chipset, and on higher core-count models there can be anywhere from two to eight CPU chiplets. Instead of separate chips linked by a bus, MI200 GPUs will stack the chips on an elevated fanout bridge with an ultra-high bandwidth die interconnect. Intel’s Xe-HPC (Ponte Vecchio) stacks 47 chiplets, linked by a high- speed EMIB (embedded multi-die interconnect bridge). Either way, the goal is the same: increase chip-to-chip bandwidth and enable better scaling across GPUs. It’s like the next evolution of HBM (high-bandwidth memory), where multiple stacks are placed on a silicon interposer that links with the main GPU to provide gobs of memory bandwidth. Except now, instead of memory, Intel and AMD are stacking processors, cache, and whatever else might be useful.

The specs of AMD’s MI250X look impressive. Each GPU chiplet has 110 compute units (CUs), 440 matrix cores, and 6 4GB of HBM2e memory. Combined, the GPUs consist of 5 8bn transistors, 128 GB of HBM2e with 3 .2TB/s of bandwidth, and 220 CUs that can deliver 48 TFLOPS of FP compute—double that for FP work done on the matrix cores. On paper, it’s nearly five times the compute of Nvidia’s A 100 solution and will be deployed in the Frontier and El Capitan supercomputers. When will we see the consumer equivalent? We don’t need 128 GB of HBM2e or all that FP64 number crunching, just 120 CUs focused on FP32 graphics performance, linked by a similar high-speed interconnect. Even a single 120 CU AMD GPU made on 5 nm technology could be up to 50 percent faster than the current Radeon RX 6900 XT. Imagine linking two or four of those GPUs together in a single monster gaming solution. That’s a while off, but I look forward to testing the Radeon RX 8950 XT dual-chiplet card in a few years. Fingers crossed. All weneed is 120 CUsofFP graphics performance, linked by a high-speed interconnect.

AMD’s Instinct MI 250 X data center GPU uses stacked chiplets.

Maximum PC - USA (2022-01)

JAN 2022 MAXIMUMPC 11

AM

D

Get our desktop app

Company

Features

Documentation

Resources