Evolution, 4th Edition

A STATISTICS PRIMER A–13

We compare the two likelihoods, and then use a probabilistic rule (see https:// en.wikipedia.org/wiki/Metropolis_Hastings_algorithm) that tells us whether to keep the first value for the node’s age and discard the second, or to do the reverse. We record the age that is retained, then repeat the process. After thousands (or even millions) of repetitions, the distribution of ages that we retained will very closely resemble the posterior probability distribution for the node’s age. We can use the distribution of retained values to estimate the node’s age (the age that has the greatest probability) and the confidence interval for that estimate. This method is one of several that collectively are called Markov Chain Monte Carlo, often abbreviated as MCMC. In this example, the aim is to estimate a single quantity (the age of a node). In practice, the method is typically used to do much more ambitious jobs, such as simultaneously estimating the branching pattern of the phylogeny, the ages of all its nodes, and the rates of sequence evolution.

Futuyma Kirkpatrick Evolution, 4e Sinauer Associates Troutt Visual Services Evolution4e_A.13.ai Date 01-08-2017 03-01-2017

(A)

(B)

(C)

Prior Likelihood

0.2 0.4 0.6 0.8 Value of p 1

Posterior

Prior Likelihood

Probability density of

p^1

Posterior

Likelihood

Prior

Posterior

FIGURE A.13 Bayesian estimates for the frequency of allele A 1 in a second population of platypuses. The likelihood function from the first population (Figure A.12) is used for the prior distribution. The actual frequency in the second population is p 1 = 0.4 (the red circle). (A) A sample of just four alleles from the second population has one copy of A 1 and three copies of A 2. The resulting likelihood function is quite flat. The posterior distribution (equal to the product of the prior distribution and the likelihood function) is very similar to the prior distribution. The peak in the posterior distribution, is p 1 = 0.21 (the black diamond). (B) With a sample of 20 alleles, we have 8 copies of A 1 and 12 copies of A 2. The likelihood function is more strongly peaked because of the larger sample size. The posterior distribution now estimates that the frequency of A 1 is p 1 = 0.3. (C) With a sample of 100 alleles, we have 37 copies of A 1 and 63 copies of A 2. The posterior distribution is now even more strongly peaked, and nearly centered on the true allele frequency of p 1 = 0.4. Our estimate for the frequency of A 1 is now p 1 = 0.34.

23_EVOL4E_APP.indd 13 3/22/17 1:52 PM

Evolution, 4th Edition

Get our desktop app

Company

Features

Documentation

Resources