The History of Mathematics: A Brief Course

(coco) #1
524 18. PROBABILITY AND STATISTICS

is called the transition matrix. If successive transitions are all independent of one
another, one can easily verify that the matrix power Pk gives the probabilities of
the transitions in fc steps.

2. Statistics


The subject of probability formed the theoretical background for the empirical sci-
ence known as statistics. Some theoretical analysis of the application of probability
to hypothesis testing and modification is due to Thomas Bayes (1702-1761), a
British clergyman. Bayes' articles were published in 1764-1765 (after his death) by
Rev. Richard Price (1723 1791). Bayes considered the problem opposite to that
considered by Jakob Bernoulli. Where Bernoulli assigned probabilities to the event
of getting fc successes in ç independent trials, assuming the probability of success
in each trial was p, Bayes analyzed the problem of finding the probability ñ based
on an observation that fc successes and n-k failures have occurred. In other words,
he tried to estimate the parameter in a distribution from observed data. His claim
was that ñ would lie between á and b with a probability proportional to the area
under the curve y = xk(l - x)n~k between those limits. He then analyzed a more
elaborate example. Suppose we know that the probability of event  is p, given
that event A has occurred. Suppose also that, after a number of trials, without
reference to whether A has occurred or not, we find that event  has occurred m
times and has not occurred ç times. What probability should be assigned to Áº
Bayes' example of event A was a line drawn across a billiard table parallel to one of
its sides at an unknown distance ÷ from the left-hand edge. A billiard ball is rolled
at random across the table, coming to rest on the left of the line m times and on
the right of it ç times. Assuming that the width of the table is a, the probability
of the ball resting left of the line is ÷/á, and the probability that it rests on the
right is 1 — ÷/a. How can we determine ÷ from the actual observed frequencies
m and n? Bayes' answer was that the probability that ÷ lies between 6 and c is
proportional to the area under the curve y = xm(a — x)n between those two val-
ues. This first example of statistical estimation is also the first maximum-likelihood
estimation, since the "density" function xm(a - x)n has its maximum value where
m(a-x) = nx, that is, ÷ = a~^, so that the proportion m:n = x:a — x holds. It
seems intuitively reasonable that the most likely value of ÷ is the value that makes
this proportion correct, and that the likelihood decreases as ÷ moves away from
this value. This is the kind of reasoning used by Gauss in his 1816 paper on the
estimation of observational errors to find the parameter (measure of precision) in
the normal distribution. To derive this result, Bayes had to introduce the concept
of conditional probability. The probability of A, given that  has occurred, is equal
to the probability that both events happen divided by the probability of B. (If Â
has occurred, it must have positive probability, and therefore the division is legit-
imate.) Although Bayes stated this much with reasonable clarity (see Todhunter,
1865, p. 298), the full statement of what is now called Bayes' theorem (see below)
is difficult to discern in his analysis.
The word statistics comes from the state records of births, deaths, and other
economic facts that governments have always found it necessary to keep for ad-
ministrative purposes. The raw data form far too large a set of numbers to be
analyzed individually in most cases, and that is where probabilistic models and
inverse-probability reasoning, such as that used by Bayes and Gauss become most

Free download pdf