%XWWKHEHQH¿WVRIODUJHUQHXUDOQHWZRUNVFRPHZLWKWUDGHR̆V7KH
more parameters and layers you add to a neural network, the more
expensive its training becomes. According to an estimate by Chuan Li,
WKH&KLHI6FLHQFH2̇FHURI/DPEGDDSURYLGHURIKDUGZDUHDQGFORXG
resources for deep learning, it could take up to 355 years and $4.6
million to train GPT-3 on a server with a V100 graphics card.
³2XUFDOFXODWLRQZLWKD9*38LVH[WUHPHO\VLPSOL¿HG,QSUDFWLFH
you can’t train GPT-3 on a single GPU, but with a distributed system
with many GPUs like the one OpenAI used,” Li says. “One will never get
perfect scaling in a large distributed system due to the overhead of
device-to-device communication. So in practice, it will take more than
PLOOLRQWR¿QLVKWKHWUDLQLQJF\FOH ́
7KLVHVWLPDWHLVVWLOOVLPSOL¿HG7UDLQLQJDQHXUDOQHWZRUNLVKDUGO\D
one-shot process. It involves a lot of trial and error, and engineers must
often change the settings and retrain the network to obtain optimal
performance.
“There are certainly behind-the-scenes costs as well: parameter tuning,
WKHSURWRW\SLQJWKDWLWWDNHVWRJHWD¿QLVKHGPRGHOWKHFRVWRI
researchers, so it certainly was expensive to create GPT-3,” says Nick
Walton, the co-founder of Latitude and the creator of AI dungeon, a
text-based game created on GPT-2.
Walton said that the real cost of the research behind GPT-3 could be
DQ\ZKHUHEHWZHHQWRWLPHVWKHFRVWRIWUDLQLQJWKH¿QDOPRGHOEXW
he added, “It’s really hard to say without knowing what their process
looks like internally.”
GOING TO A FOR-PROFIT MODEL
2SHQ$,ZDVIRXQGHGLQODWHDVDQRQSUR¿WUHVHDUFKODEZLWKWKH
PLVVLRQWRGHYHORSKXPDQOHYHO$,IRUWKHEHQH¿WRIDOOKXPDQLW\
Among its founders were Tesla CEO Elon Musk and Sam Altman,
IRUPHU<&RPELQDWRUSUHVLGHQWZKRFROOHFWLYHO\GRQDWHGELOOLRQWR
the lab’s research. Altman later became the CEO of OpenAI.
B
e
n
D
i
c
k
s
o
n