Custom PC - UK (2020-01)

(Antfer) #1

It alsohoused4MBofL2cachepairedtoa
full256-bitbuswith8GBofGDDR6memory
ontheotherside.
InJanuarythisyear,NvidiapushedTU106
intoserviceagainincut-downform,rounding
outtheGeForceRTXproductlinewiththe
GeForceRTX2060.This£329configuration
hadonly 30 SMs,3MBofL2cache,anda 192-
bitwidememoryinterface.Thatcut-down
buswidthmeantonly6GBofconnected
memory,althoughtheSMboostclock
remainedata healthy1.68GHzoutofthebox.
Then,fortheAMDNavispoilerpartyinJuly,
thefinalfullTuringproductappearedinthe
formoftheGeForceRTX 2060 Super,with
its 34 SMs,thefull4MBofL2cache,all 256
bitsofmemoryinterfaceand8GBofGDDR6
connectedtoit.


AREACOSTAND
BABYTURINGS
WhentheoriginalTuringproductswere
released,theirsizeandcostmeantthere
wasa sizeablegapatthelowerendof
Nvidia’sproductline,atleastasfaras
newproductswereconcerned.Starting
at445mm² and £329 for a TU106, the


vastmajorityofoverallGPUsalesinterms
ofvolume weren’t accounted for by the
Turing line-up.
The problem was that the additional die
size and cost of ray-tracing acceleration
hardware and AI-focused Tensor cores
(more on which later) meant the full Turing
feature set just wasn’t scalable to lower-end
products. As such, Nvidia decided to play it
safe at the lower end of their product line by
introducing smaller Turing-derived designs
that omit the RT and Tensor cores.
As an aside, it’s worth emphasising just
how area inefficient the ray-tracing cores are
for a current GPU. Unlike the GPU’s general-
purpose cores and even the Tensor cores,
there’s no secondary-use case for the RT
silicon that would ensure it’s still being used
even when there’s no ray tracing to do. As
such, when ray tracing is off in your game,
the cores are just sitting there doing nothing.
Power management circuitry means
they aren’t actually using any power, but the
manufacturing and purchase cost has still
been sunk into a feature that’s idle except for
in a handful of games. Greater ray-tracing
support will come, but it’s a futureproofing

gamble that isn’t well suited to the lower end
of the market.
TU116 is a three-GPC part with eight SMs
per GPC, a 192-bit GDDR6 memory bus,
just 1.5 MB of L2 cache, and a very healthy
1.77GHz boost clock for the SMs. With 6.6
billion transistors on 12FFN, TU116 weighs in
at a much more modest 284mm². A full-fat
configuration powers the GeForce GTX 1660
Ti, and a slightly cut down configuration powers
GeForce GTX 1660, both with 6GB of memory.
Note that Nvidia dropped the RTX branding
here because real-time ray tracing support is
gone, but these are still Turing-class GPUs.
Last but not least is TU117, the real baby of
the family. It’s a two-GPC part with just seven
SMs per GPC, only 1 MB of L2 cache, a smaller
128-bit GDDR5 memory interface with just
4GB of the memory connected and weighs
in with only 4.7 billion 12FFN transistors at
200mm². Just one desktop product uses
TU117 in its fullest configuration: the GeForce
GTX 1650, starting at around £145.

DEEP DIVE
Now we’ve taken a look at the product stack
and how it evolved with the RTX Super line-

TU102 TU104 TU106 TU116 TU117


CUDA cores 4,608 3,072 2,304 1,536 896


GPCs 6 6 3 3 2


SMs 72 48 36 24 14


Texture units 288 192 144 96 96


RT cores^724836 N /A N /A


Tensor cores^576384288 N /A N /A


ROPs^9664644849


Memory bus
width

384-bit 256-bit 256-bit 192-bit 128-bit

L2 cache 6MB 4MB 4MB 1.5MB 1MB


Memory type/
max size

24GB
GDDR6

8GB
GDDR6

8GB
GDDR6

6GB
GDDR6

4GB
GDDR6

Die size 754mm² 545mm² 445mm² 284mm² 284mm²


The Turing line-up
spans the whole
range of graphics
card prices, but
you lose some
features in the
cheaper products

FEATURE / ANALYSIS

Free download pdf