516 18. PROBABILITY AND STATISTICS
numbers, defined by the formula
fe=l
Nowadays we define these numbers as Bo = 1, B\ = — |, and thence B 2 = A,
B$ = -B, and so forth. He illustrated his formula by finding
1000
Ó *^10 = 91409924241424243424241924242500.
t=i
The law of large numbers. Bernoulli imagined an urn containing numbers of black
and white pebbles, whose ratio is to be determined by sampling with replacement.
Here it is possible that you will always get a white pebble, no matter how many
times you sample. However, if black pebbles constitute a significant proportion of
the contents of the urn, this outcome is very unlikely. After discussing the degree of
certainty that would suffice for practical purposes (he called it virtual certainty),^5
he noted that this degree of certainty could be attained empirically by taking a
sufficiently large sample. The probability that the empirically determined ratio
would be close to the true ratio increases as the sample size increases, but the
result would be accurate only within certain limits of error. More precisely, given
certain limits of tolerance, by a sufficient number of trials,
[W]e can attain any desired degree of probability that the ratio
found by our many repeated observations will lie between these
limits.
This last assertion is an informal statement of the law of large numbers for
what are now called Bernoulli trials, that is, repeated independent trials with the
same probability of a given outcome at each trial. If the probability of the outcome
is ñ and the number of trials is n, this law can be phrased precisely by saying that
for any e > 0 there exists a number no such that if m is the number of times
the outcome occurs in ç trials and ç > no, the probability that the inequality
|(m/n) - p\ > å will hold is less than e.^6 Bernoulli stated this principle in terms
of the segment of the binomial series of (r + s)n(r+a) consisting of the ç terms on
each side of the largest term (the term containing rnrsns), and he proved it by
giving an estimate on ç sufficient to make the ratio of this sum to the sum of the
remaining terms at least c, where c is specified in advance. This problem is the
earliest in which probability and statistics were combined to solve a problem of
practical application.
(^5) This phrase is often translated more literally as moral certainty, which has the wrong
connotation.
(^6) Probabilists say that the frequency of successes converges "in probability" to the probability
of success at each trial. Analysts say it converges "in measure." There is also a strong law of
large numbers, more easily stated in terms of independent random variables, which asserts that
(under suitable hypotheses) there is a set of probability 1 on which the convergence to the mean
occurs. That is, the convergence is "almost surely," as probabilists say and "almost everywhere,"
as analysts phrase the matter. On a finite measure space such as a probability space, almost
everywhere convergence implies convergence in measure, but the converse is not true.