Evolution, 4th Edition

(Amelia) #1

A STATISTICS PRIMER A–7


rather than two: the distance between the point and the mean along
PC1. The description is not perfect because the points fall a little way
off the line. By working with those distances rather than the original
pairs of measurements, we reduce the size of the data set by half. In
FIGURE A.8, an individual is identified who has a much larger than
average value for PC1 (and a slightly smaller than average value of
PC2). Using just his value for PC1 describes most of the differences
between him and the average individual in the population.
In some situations we have more than two measurements on each
individual (for example, measures of arm length, leg length, and body
mass). In that case, additional principal components are fit in the same
way as the first two were. Each new principal component must be per-
pendicular to those that have already been fit, and it must run in the
direction that has the most remaining variation (FIGURE A.9). With
three variables (as in this example), if we represent each individual in
the data set only by his value for PC1, we shrink the number of mea-
surements for each individual from three to one. That decreases the
number of variables we need to analyze by two-thirds. If there are hun-
dreds of measurements per individual, the savings are much larger yet.

Estimation
Asian elephants (Elephas maximus) are smaller than African elephants.
The mean weight of a male Asian elephant is 5000 kg, which is about
2000 kg less than its African kin. How do we know those facts? Of
course, nobody has weighed all the elephants, which is what would be
needed to know the true mean weight of these two species. Instead,
researchers have taken the weights of a number of elephants, and
from those data they estimate the means for the two species.
When we estimate a mean or other quantity, the group of individ-
uals that are measured is called the sample, and the group from which the sample
comes is called the population. Without measuring all individuals in the popula-
tion, there is always some uncertainty about the actual mean of the population.
Statistics lets us quantify that uncertainty.

Futuyma Kirkpatrick Evolution, 4e
Sinauer Associates
Troutt Visual Services
Evolution4e_A.08.ai Date 01-09-2017

Leg length

Arm length

PC1

PC2

FIGURE A.8 The position of a point along PC1 gives a
good approximation of its location. An example is shown
for the point highlighted in red, which represents the
man shown at upper right. The population’s mean is
shown by the black point, which corresponds to a man
that looks like the one shown at lower right. The position
of the red dot along PC1 is shown by the green diamond.
By using only the distance along PC1 from the mean to
the green diamond, rather than the two original measure-
ments of arm length and leg length, we can reduce by
half the number of variables needed to describe the man
at upper right.

FIGURE A.9 Principal components can be used with three or
more variables. To make their location in space more clear, the
points are colored according to their distance from the viewer
(darker points are closer). The first principal component (PC1) runs
in the direction in which there is the most variation. The second
principal component (PC2) is perpendicular to PC1, and runs in
the direction in which there is the most remaining variation. The
third principal component (PC3) is perpendicular to PC1 and PC2.
The lengths of the lines for the principal components are again
proportional to the amount of variation (the standard deviation) in
that direction.

Futuyma Kirkpatrick Evolution, 4e
Sinauer Associates
Troutt Visual Services
Evolution4e_A.09.ai Date 01-18-2017 03-01-2017

PC1

PC2 PC3

Measurement 3

Measurement 1
Measurement 2

23_EVOL4E_APP.indd 7 3/22/17 1:52 PM

Free download pdf