412 CHAPTER 16
One widely used method for estimating phy-
logenies is based on likelihood. This is a gen-
eral statistical approach that is described in the
Appendix. Here we illustrate how likelihood is
used to estimate a phylogeny with a very simple
example. This is advanced material that can be
considered optional.
The data are from the example used earlier to
illustrate parsimony, with the DNA bases at three
sites in the genomes of three species (see Figure
16.12). The first step is to calculate the probability
that these data would be observed, given the
phylogenetic tree. The second step is to find
which of all possible phylogenies maximizes that
probability, which gives us the maximum likeli-
hood estimate for the phylogeny.
We begin by focusing on just the first of the three bases
(FIGURE 16.A1). To find the likelihood, we need to make
assumptions about how this base evolves. Here we make
the simple assumptions that the probability that a substitu-
tion occurs (that is, one base replaces another) is constant
in time and equal for all possible changes (for example,
from C to G, or from A to T). For the moment, we will also
assume that we know from using data from outgroups that
the base in the MRCA of these three species was an A.
The likelihood of the data depends on three things: the
topology (or branching order) of the tree, the lengths of
the branches (measured in millions of years), and the sub-
stitution rate (that is, the probability per million years that
one base will be replaced by another). The three possible
topologies are shown in Figure 16.A1. For each of them,
we find the lengths of the branches (t 1 and t 2 ) that make
the data most likely. We then choose the topology that has
the highest likelihood. For tree A, the likelihood turns out
to be given by this rather intimidating equation:
Here λ is the substitution rate. In this example, we’ll as-
sume that we know λ = 0.3, for example from a molecular
clock that has been calibrated for this gene in related spe-
cies. Other equations (which look quite similar) give the
likelihoods for trees B and C. We will not explain here how
this equation was derived, but the interested reader can
find a clear explanation on p. 194 of [15].
FIGURE 16.A2 shows a plot of the likelihood as the
branch lengths of tree A are varied. The maximum value of
the likelihood is reached when t 1 (the time from the root to
the speciation event between species 1 and species 2) is
2.7 million years, and when t 2 (the time from the speciation
event to the present) is 0. The estimate that t 2 = 0 makes
sense: it implies that the MRCA of species 1 and 2 also had
a T, and that there has been no time for either lineage to
have a substitution since then.
By evaluating Equation 16.A1 with t 1 = 2.7 and t 2 = 0, we
find that maximum value of the likelihood for tree A is 0.083.
Doing similar calculations for trees B and C show that their
maximum likelihoods are 0.016, which is about 5 times
smaller than the value for tree A. The data therefore suggest
that tree A is the actual phylogeny. We are not very confi-
dent in this conclusion, though. The difference in the two
likelihoods is not statistically significant, which is not surpris-
ing since the data come from just a single DNA base.
With more data, however, we become more certain
about the phylogeny. Data from two additional DNA
bases are shown in Figure 16.12. To make use of them, we
assume that the bases have evolved independently and
with the same substitution rate. In that case, we can simply
multiply the likelihoods for each base calculated from
Equation 16.A1 to find the overall likelihood of a given
phylogeny. The second base shows exactly the same evo-
L(t 1 ,t 2 ) =
x [3 + exp {
1
64
4
exp {– 3 (2t 1 ,3t 2 )λ}
4
3 (t 1 + t 2 )λ}]
x [3 + exp { 43 t 1 λ} – 2 exp{ 43 t 2 λ}+ exp{^43 (^ t 1 + 2t 2 )λ}– 2]^
BOX 16A
Estimating Trees with Likelihood
Futuyma Kirkpatrick Evolution, 4e
Sinauer Associates
Troutt Visual Services
Evolution4e_Box16.A1.ai Date 01-25-2017
Au: Are outgroups needed here?
t 2
t 1
A
T T A
Sp 1 Sp 2
Tree A
Sp 3
A
T A T
Sp 1 Sp 3 Sp 2
A
T T A
Sp 1 Sp 2
Tree B
Sp 3
Tree C
FIGURE 16.A1 The three possible topologies (shapes) for the phylog-
eny of three species and the base that they each have at a site in the
genome. We assume from using data from outgroup species we know
that the ancestor of the three species had an A, while species 1 and
species 2 have a T. The time from the tree’s root to the speciation event
is t 1 , and the time from that event to the present is t 2.
16_EVOL4E_CH16.indd 412 3/22/17 1:33 PM