New Scientist - USA (2022-04-09)

(Maropa) #1

36 | New Scientist | 9 April 2022


Features


server, owned by an external company.
You can’t share raw data unthinkingly. It
will typically contain sensitive personal details,
anything from names and addresses to voting
records and medical information. There is an
obligation to keep this information private,
not just because it is the right thing to do,
but because of stringent privacy laws, such as
the European Union’s General Data Protection
Regulation (GDPR). Breaches can see big fines.
Over the past few decades, we have come up
with ways of trying to preserve people’s privacy
while sharing data. The traditional approach
is to remove information that could identify
someone or make these details less precise,
says privacy expert Yves-Alexandre de
Montjoye at Imperial College London.
You might replace dates of birth with an
age bracket, for example. But that is no
longer enough. “It was OK in the 90s, but
it doesn’t really work any more,” says de
Montjoye. There is an enormous amount of
information available about people online,
so even seemingly insignificant nuggets can
be cross-referenced with public information
to identify individuals.
One significant case of reidentification
from 2021 involves apparently anonymised
data sold to a data broker by the dating app
Grindr, which is used by gay people among
others. A media outlet called The Pillar
obtained it and correlated the location pings
of a particular mobile phone represented in
the data with the known movements of a
high-ranking US priest, showing that the
phone popped up regularly near his home
and at the locations of multiple meetings he
had attended. The implication was that this
priest had used Grindr, and a scandal ensued

Extreme encryption


A clever form of cryptography allows us to see data


without ever looking at it. Could this dispel the privacy


fears that hobble big data? Edd Gent investigates


L

IKE any doctor, Jacques Fellay wants to
give his patients the best care possible.
But his instrument of choice is no scalpel
or stethoscope, it is far more powerful than
that. Hidden inside each of us are genetic
markers that can tell doctors like Fellay which
individuals are susceptible to diseases such as
AIDS, hepatitis and more. If he can learn to read
these clues, then Fellay would have advance
warning of who requires early treatment.
This could be life-saving. The trouble is,
teasing out the relationships between genetic
markers and diseases requires an awful lot of
data, more than any one hospital has on its
own. You might think hospitals could pool
their information, but it isn’t so simple.
Genetic data contains all sorts of sensitive
details about people that could lead to
embarrassment, discrimination or worse.
Ethical worries of this sort are a serious
roadblock for Fellay, who is based at Lausanne
University Hospital in Switzerland. “We have
the technology, we have the ideas,” he says.
“But putting together a large enough data
set is more often than not the limiting factor.”
Fellay’s concerns are a microcosm of one of
the world’s biggest technological problems.
The inability to safely share data hampers
progress in all kinds of other spheres too, from
detecting financial crime to responding to
disasters and governing nations effectively.
Now, a new kind of encryption is making it
possible to wring the juice out of data without
anyone ever actually seeing it. This could help
end big data’s big privacy problem – and Fellay’s
patients could be some of the first to benefit.
It was more than 15 years ago that we first
heard that “data is the new oil”, a phrase coined
by the British mathematician and marketing

expert Clive Humby. Today, we are used to the
idea that personal data is valuable. Companies
like Meta, which owns Facebook, and Google’s
owner Alphabet grew into multibillion-dollar
behemoths by collecting information about
us and using it to sell targeted advertising.
Data could do good for all of us too. Fellay’s
work is one example of how medical data
might be used to make us healthier. Plus,
Meta shares anonymised user data with aid
organisations to help plan responses to floods
and wildfires, in a project called Disaster Maps.
And in the US, around 1400 colleges analyse
academic records to spot students who are

likely to drop out and provide them with
extra support. These are just a few examples
out of many – data is a currency that helps
make the modern world go around.
Getting such insights often means
publishing or sharing the data. That way,
more people can look at it and conduct
analyses, potentially drawing out unforeseen
conclusions. Those who collect the data often
don’t have the skills or advanced AI tools to
make the best use of it, either, so it pays to
share it with firms or organisations that do.
Even if no outside analysis is happening,
the data has to be kept somewhere,
which often means on a cloud storage

“ Even insignificant


nuggets of information


can be used to identify


people in the data”


>
Free download pdf