- STATISTICS 529
leave this topic without at least mentioning one of the giants in this area, Karl
Pearson (1856-1936), a British polymath who studied philosophy and law and was
even called to the bar, although he never practiced. As professor at University
College, London, he became interested in Darwin's theory of evolution and wrote
a number of papers between 1893 and 1912 on the mathematics of evolution. In
an 1893 paper he coined the term standard deviation to denote the natural unit of
probability, and in 1900 he introduced the chi-square test of significance, a mainstay
of applied statistics nowadays.^19 Mathematically, the chi-square distribution with
ç degrees of freedom is the distribution of the sum of the squares of ç independent
standard normal distributions. What that means is that if the probability that Xk
lies between a and b is given by the normal density,:
P{a<Xk<b) = -j=j-^ dt,
and each of these probabilities is independent of all the others, the probability that
X\ Ë 1- X^2 lies between 0 and c is given by the chi-square density with ç degrees
of freedom:
P{X\ + -.. + xl<c) = [ dx.
The chi-square distribution is useful because, if X\,..., Xn are independent random
variables with expected positive values ìé,..., ì„, the random variable
2= (×é-ìé)
(^2) (×ç-ìç) 2
Á. ' '
ìé ìç
has the chi-square distribution with ç - 1 degrees of freedom. One can then deter-
mine whether actual deviations of the variables Xk from their expected values are
likely to be random (hence whether bias is present) by computing the value of ÷^2
and comparing it with a table of chi-square values.
To illustrate the connection between the chi-square and the standard normal
distribution, Fig. 3 shows the frequency histogram for a computer experiment in
which 1000 random values were computed for the sum of the squares of 10 standard
normal random variables. This histogram is superimposed on the graph of the chi-
square density function with 10 degrees of freedom.
The word bias in the preceding paragraph has a purely statistical meaning of
"not random." The rather pejorative meaning it has in everyday life is an indication
of the connection that people tend to make between fairness and equal outcomes.
If we find that some identifiable group of people is underrepresented or overrepre-
sented in some other population—prisons or universities or other institutions—we
proceed on the assumption that some cause is operating. In doing so, one must
beware of jumping to conclusions, as Arbuthnott did. He was quite correct in his
conclusion that the sexual imbalance was not a random deviation from a general
rule of equality, but there are all kinds of possible explanations for the bias. An even
larger sexual imbalance exists in China today, for example, as a result of the one-
child policy of the Chinese government, combined with a traditional social pressure
to produce male heirs. Evolutionary theory produces an explanation very similar
to Arbuthnott's, but based on adaptation rather than intelligent, human-centered
design.
(^19) The symbol ÷ (^2) was Pearson's abbreviation for x (^2) + y (^2).