New Scientist - USA (2020-08-15)

(Antfer) #1

48 | New Scientist | 15 August 2020


T


HERE IS a quote from Nate Silver
about statistics which I think is very
nice: “The numbers have no way of
speaking for themselves. We speak for them,
we imbue them with meaning.” You can't just
collect some data and it'll tell you the answer.
There is an art to trying to extract information,
knowledge and understanding from data,
and even in choosing what data to collect. It’s
something we’ve all been dealing with over
the past few months with covid-19: can we
believe these numbers? What do they mean?
Now if this were a live audience, I'd be
asking how many people have done stats
courses. If people put their hand up, I'd ask
how many people actually enjoyed them, and
most of the hands would go down. That
makes me upset. I love statistics, I think it's
great. But it has tended to be taught in the past
as a series of formulae and tests and
regression and things like that.
My book The Art of Statistics takes a very
different approach. It spends a lot of time on
problem solving, on things like: what are you
trying to do? Is this data suitable for what
you're trying to answer? What can we
conclude from it? It's amazing how far you
can get without ever doing any fancy

New Scientist Events Statistics


No more


damned lies...


Statistical literacy is more important than ever in the


covid crisis, statistician David Spiegelhalter told the


audience at a recent New Scientist event – but that


means turning yourself into a data detective


statistical methods or using probability
theory or the sample distribution of the
sample mean and all this sort of stuff we all
had to endure – and which I've always taught,
of course.
The key is what is called the “data cycle”.
You don't start off with data, you start off with
a problem. You plan how are you going to try
to answer it. Is there any data, and what might
we collect? Then you collect data and wrangle
it and manage it and clean it up. Only then do
you come to the analysis. That’s normally the
only thing that is taught in stats courses, but
it's only a small part of the whole cycle.
It’s followed by the communication,
drawing the appropriate conclusions, putting
the message out. And there always you have
to start again. Because as we’re going to see
again and again, how you do an analysis just
leads to more questions.
I find the data cycle an immensely powerful
way to structure the use of data to solve
problems. How many sexual partners have
people had? Is it worth me taking statins?
Who was the luckiest person on the Titanic?
What's the probability that the skeleton found
in that Leicester car park really was Richard
the Third? Why do old men have big ears?

I mean, these really important questions
for the future of humanity.
Many of these issues I've actually been
working on. An example I use in the book is
that of the doctor Harold Shipman. He
murdered at least 215 of his patients, and
probably considerably more, over 24 years
working in the Hyde suburb of Manchester.
I worked on the Shipman inquiry as one of the
statisticians who were asked to look at the
data and answer various questions. The first
question really was, what was happening?
So here's a graphic (right) that just shows
the pattern of his murders: the red dots are
women, the blue are men. The histograms at
the top and on the left show the distribution
of the years of the killings and the age of the
victims. So just from this data alone, the ages
and sex of the victims and the dates of their
deaths, what can we conclude?
Is he mainly killing old people? Yes, but it
looks like more near the end, some of his
victims were much younger. And do you
notice that gap of a year – did he go on
holiday? No, up to then he had been working
in a joint practice, and tt's thought that he
suspected that he was being suspected. After
that, he set up his own single-handed practice,
Free download pdf