Telling the Evolutionary Time: Molecular Clocks and the Fossil Record

(Grace) #1
Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC) methods have made Bayesian methods
computationally feasible (see Thorne et al. 1998; Huelsenbeck et al. 2000). MCMC methods
are usually employed to estimate the parameter values determining posterior distributions.
In this case, parameter values are continually re-estimated in all combinations, given a
particular tree and set of data until the chain ‘converges’ on a particular set of parameters,
at which point the chain can be said to have achieved stationarity. This provides an effective
and thorough analysis, although computationally time-consuming. It has been used to model
variable evolutionary rates within trees, assuming various distributions including the log-
normal (Thorne et al. 1998) and compound Poisson (Huelsenbeck et al. 2000).


Non-parametric rate smoothing

Along with the parametric methods outlined above for estimating divergence times, non-
parametric approaches are also available (e.g. non-parametric rate smoothing, NPRS;
Sanderson 1997). This method modifies branch lengths within a given tree, assuming rate
correlation between ancestral and descendant branches.


Increased complexity at a price?

As models become more sophisticated and parameter-rich they provide a better fit to the
observed patterns of evolution of individual datasets. This is reflected in the general
improvement in tree likelihood values with increased model complexity. However, as
methods increase in statistical complexity, they also become more computationally
expensive and non-user friendly. Also, where assumptions are moot the methods may not
be significantly better than more statistically simple alternatives.
As the number of methodological parameters increases, so does the variance of the
estimates yielded, as reflected in the ever-widening confidence intervals.


Working example

A small-scale study of metazoan divergence was performed using the above criteria. Taxa
chosen were Homo sapiens, Rattus norvegicus, Gallus gallus, Drosophila melanogaster,
Caenorhabditis elegans, Saccharomyces cerevisiae and outgroup Arabidopsis thaliana.
Phosphoglycerate kinase (PGK), replication factor C (RFC), nucleo-side diphosphate
kinase (NDK) and triose phosphate isomerase (TPI) nuclear gene sequences were obtained
from GenBank (accession numbers and alignments available from the authors on request).
Sequences were aligned by manual inspection and gapped sections and regions of
ambiguous alignment were removed. ML trees were constructed using PAUP (PAUP* 4.
0b8, Swofford 1998) from individual genes and concatenated data from first, second, and
the two combined codon positions. All analyses were made using the general time-
reversible model, and gamma and invariant sites values were estimated from the data.
Trees were rooted using plant sequences as outgroups and ML trees were found by an
exhaustive search. Topologies shown in Figure 3.2 were compared using the Shimodaira-
Hasegawa (SH) test (Shimodaira and Hasegawa 1999) for differences in fit.


PHYLOGENETIC FUSES AND EVOLUTIONARY ‘EXPLOSIONS’ 59
Free download pdf