A STATISTICS PRIMER A–11
where x! stands for x × (x – 1) ... 3 × 2 × 1. (This equation comes
from the binomial distribution, which often appears in statis-
tics.) With that formula, we find that if the actual frequency of
A 1 in the population is p 1 = 0.5, then the likelihood of our data
is L = 0.0046.
FIGURE A.12 shows how the likelihood (that is, probability of
the observed data) varies as p 1 ranges from 0 to 1. This is called
the likelihood function. The likelihood function reaches its greatest
value, L = 0.22, with p 1 = 0.2. That is, the data are most likely if
that is the true frequency of A 1 in the population we sampled.
This is called the maximum likelihood estimate of the allele
frequency. In this example, the maximum likelihood estimate
corresponds to common sense: it equals the frequency of A 1 in
our actual sample of genes (4/20 = 0.2). In other situations, the
maximum likelihood estimate cannot be found from an average
or other simple summary statistic.
The complete likelihood function gives us more information
than just its maximum. It also conveys the range of values of p 1 that
are plausible. The maximum likelihood estimate suggests that the
frequency of A 1 is somewhere near 0.2, but it is almost certainly not
exactly equal to 0.2. It is often useful to consider the confidence
interval, which is the range of values in which the real value of p 1 is
very likely to lie. A rule commonly used is to determine the range of
values of p 1 for which the likelihood L is no more than seven times
smaller than the maximum likelihood. We can be 95 percent cer-
tain that the true value of p 1 lies within that range. In our example,
the maximum likelihood is L = 0.22, so we seek the value of p 1 that
gives a likelihood that is at least equal to 0.22 / 7 = 0.031. Figure A.12 shows that
range of values is from p 1 = 0.07 to p 1 = 0.41. We are 95 percent confident that the true
value of the allele frequency lies somewhere in that range.
This example illustrates two of the major applications of the likelihood
approach: using maximum likelihood to estimate something about the population
(such as its mean), and finding the confidence interval for that quantity. Likelihood
is used for a broad range of problems in evolutionary biology, such as estimating
phylogenies and effective population sizes. The key requirement is that we be able
to calculate the probability of the data given assumptions about how they were
produced.
Bayesian Inference
An alternative to likelihood that is increasingly used in many areas of evolutionary
biology is Bayesian inference. The goal here is to find the probability that the allele
frequency (or other variable) in the population is equal to any given value. There
are two main motivations for using the Bayesian approach. The first is to make use
of information that we already have. Likelihood has no way of combining prior
information with new data, but Bayesian inference does. With little or no new
data, Bayesian estimates rely heavily on the prior information. But as more and
more new data are gathered, they are given more and more weight. With enough
new data, the prior information has a negligible effect on our estimate.
Say, for example, that after sampling ten alleles from platypuses living in the
first river, we move to a second river nearby. We think that platypuses migrate
back and forth between the rivers, so we expect allele frequencies in the two
populations to be similar. We can therefore use our first sample to form a prior
Futuyma Kirkpatrick Evolution, 4e
Sinauer Associates
Troutt Visual Services
Evolution4e_A.12.ai Date 01-18-2017 03-01-2017
0.031
0.05
0.10
0.15
0.20
Maximum likelihood:
L = 0.22
Maximum likelihood
estimate of p 1 = 0.2
Condence interval:
L 0.07 < p 1 < 0.41
, the likelihood of
p^1
0.2 0.4 0.6
Value of p 1
FIGURE A.12 The likelihood function for p 1 , the frequency
of allele A 1 in the population, given that we have a sample
with 4 copies of allele A 1 and 16 copies of allele A 2. The
likelihood reaches a maximum value of L = 0.22 when
p 1 = 0.2. The confidence interval is the range of values in
which we are 95 percent sure that the true value of the
allele frequency lies (the shaded box). It corresponds
approximately to values of p 1 that give likelihoods no less
than seven times smaller than the maximum likelihood, that
is, L greater than 0.22 / 7 = 0.031. The confidence interval
ranges from p 1 = 0.07 to p 1 = 0.41.
23_EVOL4E_APP.indd 11 3/22/17 1:52 PM