The Rules of Contagion

(Greg DeLong) #1

knew a US citizen’s age, gender, and ZIP code, in many cases you
could narrow it down to a single person. At the time, several medical
databases included these three pieces of information. Combine them
with an electoral register and Sweeney reckoned you could probably
work out whose medical records you were looking at.[41]


So that’s what she did. ‘To test my hypothesis, I needed to look up
someone in the data,’ she later recalled.[42] The state of
Massachusetts had recently made ‘anonymised’ hospital records
freely available to researchers. Although Governor William Weld had
claimed the records still protected patients’ privacy, Sweeney’s
analysis suggested otherwise. She paid $20 to access voter records
for Cambridge, where Weld lived, then cross-referenced his age,
gender, and ZIP code against the hospital dataset. She soon found
his medical records, then mailed him a copy. The experiment – and
the publicity it generated – would eventually lead to major changes in
how health information is stored and shared in the US.[43]
As data spread from one computer to another, so do the resulting
insights into people’s lives. It’s just not medical or genetic information
we need to be careful with; even seemingly innocuous datasets can
hold surprisingly personal details. In March 2014, a self-described
‘data junkie’ named Chris Whong used the Freedom of Information
Act to request details of every yellow taxi ride in New York City
during the previous year. When the New York City Taxi and
Limousine Commission released the dataset, it included the time and
location of the pick up and drop off, the fare, and how much each
passenger tipped.[44] There were over 173 million trips in total.
Rather than give the real licence plates, each taxi was identified by a
string of apparently random digits. But it turned out the journeys
were anything but anonymous. Three months after the dataset was
released, computer scientist Vijay Pandurangan showed how to
decipher the taxi codes, converting the scrambled digits back into
the original licence plates. Then graduate student Anthony Tockar
published a blog post explaining what else could be discovered. He’d
found that with a few simple tricks, it was possible to extract a lot of
sensitive information from the files.[45]

Free download pdf