64 Science & technology The EconomistAugust 8th 2020
2 useful in many of the more bad-tempered
corners of the internet. Human readers
struggled to distinguish between news ar-
ticles written by the machine and those
written by people (see chart).
Given that Openaiwants eventually to
sell gpt-3, these results are promising. But
the program is not perfect. Sometimes it
seems to regurgitate snippets of memo-
rised text rather than generating fresh text
from scratch. More fundamentally, statis-
tical word-matching is not a substitute for
a coherent understanding of the world.
gpt-3 often generates grammatically cor-
rect text that is nonetheless unmoored
from reality, claiming, for instance, that “it
takes two rainbows to jump from Hawaii to
17”. “It doesn’t have any internal model of
the world—or any world—and so it can’t do
reasoning that requires such a model,” says
Melanie Mitchell, a computer scientist at
the Santa Fe Institute.
Getting the model to answer questions
is a good way to dispel the smoke and mir-
rors and lay bare its lack of understanding.
Michael Nielsen, a researcher with a back-
ground in both aiand quantum comput-
ing, posted a conversation with gpt-3 in
which the program confidently asserted
the answer to an important open question
to do with the potential power of quantum
computers. When Dr Nielsen pressed it to
explain its apparent breakthrough, things
got worse. With no real understanding of
what it was being asked to do, gpt-3 retreat-
ed into generic evasiveness, repeating four
times the stock phrase “I’m sorry, but I
don’t have time to explain the underlying
reason why not.”
There are also things that gpt-3 has
learned from the internet that Openai
must wish it had not. Prompts such as
“black”, “Jew”, “woman” and “gay” often
generate racism, anti-Semitism, misogyny
and homophobia. That, too, is down to
gpt-3’s statistical approach, and its funda-
mental lack of understanding. Having been
trained partly on text scraped from the in-
ternet, it has noted that words like “wom-
an” are often associated with misogynistic
writing, and will mindlessly reproduce
that correlation when asked.
This problem is a hot topic in aire-
search. Facial-recognition systems, for in-
stance, notoriously do better with white
faces than black ones, since white faces are
more common in their training sets. aire-
searchers are trying to tackle the problem.
Last year ibmreleased a set of training im-
ages that contained a more diverse mix of
faces. Openaiitself was founded to exam-
ine ways to mitigate the risk posed by ai
systems, which makes gpt-3’s lapses all the
more noteworthy. gpt-2, its predecessor,
was released in 2019 with a filter that tried
to disguise the problem of regurgitated big-
otry by limiting the model’s ability to talk
about sensitive subjects.
Here,atleast,littleprogressseemsto
havebeenmade.gpt-3wasreleasedwith-
outa filter,thoughitseemedjustasready
toreproduceunpleasantprejudicesasits
predecessor(Openaiaddeda filtertothe
newermodelafterthatfactbecameobvi-
ous).It isunclearexactlyhowmuchquality
controlOpenaiappliedtogpt-3’straining
data,butthehugequantityoftextinvolved
wouldhavemadeanyattemptdaunting.
Itwillonlygetharderinfuture.Lan-
guagehasovertakenvisionasthebranchof
aiwiththebiggestappetitefordataand
computingpower,andthereturnstoscale
shownosignsofslowing.gpt-3maywell
bedethronedbyanevenmoremonstrous-
lycomplexanddata-hungrymodelbefore
long.AstherealDrSeussoncesaid:“The
morethatyouread,themorethingsyou
willknow.”Thatlesson,it seems,appliesto
machinesaswellastoddlers. 7
Lookwho’swriting
PeopleidentifyingAI-generatednewsarticles,%
GPT-3textgenerator,withvaryingnumberofparameters
Sources:HuggingFace;Microsoft;OpenAI *Mostparameters
Jun
Feb Oct
2018
Jan 2019
Feb Aug Feb 2020
GPT-3
May
Largest*AItextgenerators
Byreleasedate
80
100m 1bn 10bn 100bn
1bn 10bn 100bn
70
60
Equivalenttoguessingatrandom 50
Numberofparameters,logscale
↓ Betterat
foolingpeople
S
ince thebeginning of the coronavirus
pandemic, many places have struggled
with overwhelmed laboratories and a
shortage of testing kits. In March, Germany
was carrying out half the tests it needed. In
Britain testing was limited until May to
health-care workers, hospital patients and
key workers. In America shortages of va-
rious components required for testing
have been a cause of constant frustration.
Now, as countries emerge from their lock-
downs and case numbers begin to rise, the
strain is being felt once more.
America carries out roughly 800,000
tests a day. A study published by Harvard
University, however, reckons that the
country would need to carry out 5m a day in
order to reopen safely. Quest Diagnostics
and LabCorp, two of the largest test-makers
in America, have reported that over-
whelmed laboratories mean that results
are taking a week, sometimes two, to come
through, instead of a couple of days.
A technique developed in the 1940s by
Robert Dorfman, an American economist,
may help resolve the problem. Dorfman
proposed it as a way of testing soldiers en
masse for syphilis. It is, in fact, quite obvi-
ous: pool together samples taken from sev-
eral individuals and test the pool. If it is
clear, none of its members is infected, and
only one test has been used. Only if the
pool comes up positive is individual test-
ing required.
Pool-sampling has been used in Ameri-
ca, Germany and Israel and has been intro-
duced into China, India, Pakistan and Sin-
gapore. Sandra Ciesek at the University
Hospital, Frankfurt, in Germany, says that
if it were to do only individual testing, her
hospital could process about 2,000 people
a week. Now it can test ten times that num-
ber, which means tests can be given to ev-
ery patient that is admitted, for any reason.
Testing pooled samples has its difficul-
ties. For now, samples must be labelled by
hand, which is slow. There are also con-
cerns about loss of sensitivity that may re-
sult from dilution if too many samples are
mixed. A group of researchers from Tech-
nion, Israel’s oldest university, and Ram-
bam Health Care Campus, in Haifa, have
said that up to 64 samples could be mixed,
but they acknowledge that a pool this large
would be difficult to manage and could
have a higher risk of a false-negative result.
Peter Iwen, director of Nebraska’s Pub-
lic Health Laboratory, is using tests with
high sensitivity, and in pools of no more
than five samples. “No test is 100%,” he
says. “We feel very confident we can pick
up at least 97% or better.” His was one of the
first laboratories in America to use pool-
sampling, after getting permission from
Nebraska’s governor in March. On July 18th
America’s Food and Drug Administration
issued its first emergency authorisation for
the whole country to follow suit.
Besides requiring high sensitivity, pool-
sampling works best when the incidence
rate is low. The more likely a positive re-
sult, the less efficient it is—since positive
batches then have to be tested individually.
It is best used, therefore, on the asymptom-
atic, since those with symptoms are more
likely to test positive. But at the beginnings
and ends of outbreaks, when most candi-
dates for testing are, indeed, people with-
out symptoms, it looks like a valuable
time- and money-saving tool that might
become standard procedure. 7
Testing laboratories are overwhelmed.
Pool-sampling may be the solution
Covid-19 testing
Dive in