The amino acid substitution models used in this study (Table 1.1) are all special forms
of the model of Yang et al. (1998), which is based on the empirical matrix of Jones et al.
(1992), with amino acid frequencies set as free parameters (referred to as JTT-F).
Substitution-rate variation from site to site is accommodated in the substitution models
using the discrete-gamma approximation of Yang (1996a) with eight equally probable
categories of rates to approximate the continuous gamma distribution (referred to as dG
models). The transition probability matrices of models, and details about parameter
estimation are given in Yang (2000).
Likelihood ratio tests are applied to test several hypotheses of interest. For a given tree
topology (e.g. that shown in Figure 1.1), a model (H 1 ) containing p free parameters and with
log-likelihood L 1 fits the data significantly better than a nested sub-model (H 0 ) with q=p
−n restrictions and likelihood L 0 , if the deviance D= −2logΛ=−2(log L 1 −log L 0 ) falls in
the rejection region of a X^2 distribution with n degrees of freedom (Yang 1996b). We use
several starting values in the iterations to guard against the possible existence of multiple
local optima. These analyses are conducted with the CODEML programs from the PAML
version 3.0b package (Yang 2000).
Evolution of six genes in Diptera
Table 1.1 shows the log-likelihood ratio statistic values for models of protein evolution
assuming the tree topology shown in Figure 1.1. The best description of the substitution
process of ADH, AMD, DDC, GPDH, SOD, and XDH is provided by the JTT-F+dG
model, which treats amino acid frequencies as free parameters and allows variable
replacement rates among sites. The discrete gamma distribution that better
accommodates the variation of the replacement rate from site to site along GPDH is
extremely L-shaped (α=0.06; i.e. a<<1), reflecting that most (216/241; i.e. ≈90 per
cent) of the aligned residues are conserved in dipterans; the number of conserved residues
is 95 per cent when comparisons are confined to the genus Drosophila. Among-site rate
variation is also (but less) extreme in DDC (α=0.16), AMD (α=0.21), and SOD (α=0.
22), moderate in XDH (α=0.45), and lowest in ADH (α=0.86), indicating that it is the
least constrained (note, however, that after removal of the two tephritid sequences from
the ADH alignment the value of α decreases to 0.53, meaning that low conservancy of
ADH in dipterans is to a great extent due to the dramatic divergence of this protein
between drosophilids and tephritids).
The substitution models and estimates of the among-site rate variation obtained above
are used for calculating amino acid distances between pairs of sequences. The results are
summarized in Table 1.2 and Figure 1.2. For any given gene, except XDH, the rate of
evolution varies from one level to another. For some genes, the rate is generally fastest
for comparisons between species from different families (Drosophilidae and Tephritidae).
The rate of evolution is faster for comparisons between drosophilid genera (Di) than
between species of the Drosophila genus (Da) for SOD and GPDH, whereas the opposite is
the case for AMD; the two rates are fairly similar for the other three genes. Specific
comments about individual genes follow.
12 FRANCISCO RODRÍGUEZ-TRELLES ET AL.