Evolution, 4th Edition

412 CHAPTER 16

One widely used method for estimating phylogenies is based on likelihood. This is a gen- eral statistical approach that is described in the Appendix. Here we illustrate how likelihood is used to estimate a phylogeny with a very simple example. This is advanced material that can be considered optional. The data are from the example used earlier to illustrate parsimony, with the DNA bases at three sites in the genomes of three species (see Figure 16.12). The first step is to calculate the probability that these data would be observed, given the phylogenetic tree. The second step is to find which of all possible phylogenies maximizes that probability, which gives us the maximum likelihood estimate for the phylogeny. We begin by focusing on just the first of the three bases (FIGURE 16.A1). To find the likelihood, we need to make assumptions about how this base evolves. Here we make the simple assumptions that the probability that a substitution occurs (that is, one base replaces another) is constant in time and equal for all possible changes (for example, from C to G, or from A to T). For the moment, we will also assume that we know from using data from outgroups that the base in the MRCA of these three species was an A. The likelihood of the data depends on three things: the topology (or branching order) of the tree, the lengths of the branches (measured in millions of years), and the substitution rate (that is, the probability per million years that one base will be replaced by another). The three possible topologies are shown in Figure 16.A1. For each of them, we find the lengths of the branches (t 1 and t 2 ) that make the data most likely. We then choose the topology that has the highest likelihood. For tree A, the likelihood turns out to be given by this rather intimidating equation:

Here λ is the substitution rate. In this example, we’ll assume that we know λ = 0.3, for example from a molecular clock that has been calibrated for this gene in related spe-

cies. Other equations (which look quite similar) give the likelihoods for trees B and C. We will not explain here how this equation was derived, but the interested reader can find a clear explanation on p. 194 of [15]. FIGURE 16.A2 shows a plot of the likelihood as the branch lengths of tree A are varied. The maximum value of the likelihood is reached when t 1 (the time from the root to the speciation event between species 1 and species 2) is 2.7 million years, and when t 2 (the time from the speciation event to the present) is 0. The estimate that t 2 = 0 makes sense: it implies that the MRCA of species 1 and 2 also had a T, and that there has been no time for either lineage to have a substitution since then. By evaluating Equation 16.A1 with t 1 = 2.7 and t 2 = 0, we find that maximum value of the likelihood for tree A is 0.083. Doing similar calculations for trees B and C show that their maximum likelihoods are 0.016, which is about 5 times smaller than the value for tree A. The data therefore suggest that tree A is the actual phylogeny. We are not very confi- dent in this conclusion, though. The difference in the two likelihoods is not statistically significant, which is not surpris- ing since the data come from just a single DNA base. With more data, however, we become more certain about the phylogeny. Data from two additional DNA bases are shown in Figure 16.12. To make use of them, we assume that the bases have evolved independently and with the same substitution rate. In that case, we can simply multiply the likelihoods for each base calculated from Equation 16.A1 to find the overall likelihood of a given phylogeny. The second base shows exactly the same evo-

L(t 1 ,t 2 ) =

x [3 + exp {

1 64

4 exp {– 3 (2t 1 ,3t 2 )λ} 4 3 (t 1 + t 2 )λ}]

x [3 + exp { 43 t 1 λ} – 2 exp{ 43 t 2 λ}+ exp{^43 (^ t 1 + 2t 2 )λ}– 2]^

BOX 16A

Estimating Trees with Likelihood

Futuyma Kirkpatrick Evolution, 4e Sinauer Associates Troutt Visual Services Evolution4e_Box16.A1.ai Date 01-25-2017

Au: Are outgroups needed here?

t 2 t 1

A

T T A

Sp 1 Sp 2

Tree A Sp 3

A

T A T

Sp 1 Sp 3 Sp 2

A

T T A

Sp 1 Sp 2

Tree B Sp 3

Tree C

FIGURE 16.A1 The three possible topologies (shapes) for the phylogeny of three species and the base that they each have at a site in the genome. We assume from using data from outgroup species we know that the ancestor of the three species had an A, while species 1 and species 2 have a T. The time from the tree’s root to the speciation event is t 1 , and the time from that event to the present is t 2.

16_EVOL4E_CH16.indd 412 3/22/17 1:33 PM

Evolution, 4th Edition

Get our desktop app

Company

Features

Documentation

Resources