RICHARD COOKE (@rgcooke) is a writer and author
whose work has appeared in The Washington Post,
The New York Times, the Paris Review, and The New
Republic. His most recent book is On Robyn Davidson.
The site has helped its fellow tech behe-
moths, though, especially with the march
of AI. Wikipedia’s liberal content licenses
and vast information hoard have allowed
developers to train neural networks much
more quickly, cheaply, and widely than
proprietary data sets ever could have.
When you ask Apple’s Siri or Amazon’s
Alexa a question, Wikipedia helps provide
the answer. When you Google a famous
person or place, Wikipedia often informs
the “knowledge panel” that appears
alongside your search results.
These tools were made possible by a
project called Wikidata, the next ambi-
tious step toward realizing the age-old
dream of creating a “World Brain.” It
began with a Croatian computer scien-
tist and Wikipedia editor named Denny
Vrandečić. He was enthralled with the
online encyclopedia’s content but felt frus-
trated that users could not ask it questions
that required drawing on knowledge from
multiple entries across the site. Vrandečić
wanted Wikipedia to be able to answer a
query like “What are the 20 largest cities
in the world that have a female mayor?”
“The knowledge is obviously in Wikipe-
dia, but it’s hidden,” Vrandečić told me. To
get it out “would be huge work.”
Drawing on an idea from the early inter-
net called “the semantic web,” Vrandečić
set out to structure and enrich Wikipedia’s
data set so that it could, in effect, begin
to synthesize its own knowledge. If there
were some way to tag women and mayors
and cities by population size, then a cor-
rectly coded query could return the 20
largest cities with a female mayor auto-
matically. Vrandečić had edited Wiki-
pedia in Croatian, English, and German,
so he recognized the limitations of using
plain English semantic tagging. Instead, he
chose numerical codes. Any reference to
the book Treasure Island might be tagged
with the code Q185118, for example, or the
color brown with Q47071.
Vrandečić assumed this coding and tagging would have to be
carried out by bots. But of the 80 million items that have been
added to Wikidata so far, around half have been entered by human
volunteers, a level of crowdsourcing that has surprised even Wiki-
data’s creators. Editing Wikidata and editing Wikipedia, it turns
out, are different enough that they don’t cannibalize the same
contributors. Wikipedia attracts people interested in writing
prose, and Wikidata compels dot-connectors, puzzle-solvers,
and completionists. (Its product manager, Lydia Pintscher, still
comes home from a movie and manually copies the cast list from
IMDb into Wikidata with the appropriate tags.)
As platforms like Google and Alexa work to provide instant
answers to random questions, Wikidata will be one of the key
architectures that link the world’s information together. The
system still results in errors sometimes—that’s why Siri briefly
thought Bulgaria’s national anthem was “Despacito”—but its
prospective scale is already more ambitious than Wikipedia’s.
There are subprojects aiming to itemize every sitting politician
on earth, every painting in every public collection worldwide,
and every gene in the human genome into searchable, adaptable,
and machine-readable form.
The jokes will still be there. Consider Wikidata’s numerical tag
for the author Douglas Adams, Q42. In Adams’ book The Hitchhik-
er’s Guide to the Galaxy, a group of hyperintelligent beings build
a vast, powerful computer called Deep Thought, which they ask
for the “Answer to the Ultimate Question of Life, the Universe,
and Everything.” What comes out is the number 42. That wink
of self-awareness—at the folly and joy of building something as
preposterous and powerful as a world brain—is why, with Wiki-
pedia, you know you are getting the best possible information.