–f A –N 100 –p 77 –x 933. Here, we use the LG model of
protein sequence evolution [42] with empirical equilibrium
frequencies for the 20 amino acids. We model substitution
rate heterogeneity across sites with aΓ-distribution and allow
a fraction of invariant sites. The uneven numbers provided with
the options–pand–xspecify random seeds for initializing the
parsimony stepwise addition and the rapid bootstrapping pro-
cedure, respectively. Once RAxML has completed successfully,
it will have generated a number of output files. The one ending
with “bipartitionsBranchLabels.bs” contains the maximum like-
lihood tree together with the branch support labels inNewick
format.
- For tree visualization, open the ML tree with FigTree
[36]. The program gives you a broad variety of options to
adjust the tree display. A maximum likelihood tree reconstruc-
tion results in an unrooted tree, and thus the direction of time
in the tree remains unknown. To make tree interpretation more
intuitive, we recommend to root the tree. If possible, place the
root on a branch leading to a known outgroup. If you have no a
priori knowledge about a possible outgroup in your data set,
you can still use a midpoint root. The root is then placed such
that it is approximately equidistant to all leaves (seeNote 16).
- Interpret the evolutionary history of your protein on the basis
of the rooted tree. Again, we can give only very general guide-
lines. The conceptual principles are outlined in Fig.10, and a
real-world example—the evolutionary relationships of human
AMPKγ with particular focus on its homologs in plants,
AMPKβγand KING1—is shown in Fig.11. As a start, make
yourself familiar with the evolutionary relationships of the
species whose sequences you are analyzing, i.e., the species
tree. When you are investigating the evolutionary history of
orthologs, the gene tree should be congruent to the species
tree (Fig.10a). Incongruences between the gene tree and the
species tree indicate the presence of non-orthologous
sequences, problems during tree reconstruction, or both (see
Note 17; Fig.10b). If you have combined orthologs for more
than one seed protein in the analysis, then you should see the
species tree reflected in the individual subtrees corresponding
to the orthologous groups (Fig.10c). The node connecting
the orthologs from the two evolutionarily most distantly
related species informs about the minimal age of the protein.
Any species that can be traced back to this ancestral node must
have the protein present unless it was secondarily lost. Gene
duplications are indicated by a duplicated subtree in the phy-
logeny. These subtrees generally represent sequences from at
least overlapping species sets. If a species is represented only in
one of the two subtrees, it must have lost one copy of the
Tracing AMPK Evolution 133