duplicated gene (Fig.10d). If gene loss appears prevalent in
one subtree, it is worthwhile considering a tree reconstruction
artifact rather than a gene duplication (Fig.11;seeNote 18).
- When a sequence tree does not follow your expectation
(Fig.10), you can test whether it explains the data significantly
better than the tree you were expecting [43]. For example, the
ML tree in Fig.11 places the fungal sequences at positions that
disagree with the commonly accepted eukaryote phylogeny. - To test whether this discrepancy is indeed significantly sup-
ported by the data, modify a copy of the maximum likelihood
tree such that it reflects the expected tree (seeNote 19).
(A)
Y 1 At 1 H 1? Ec 1
(B)
Yeast Human A. thaliana E. coli
Y 1 H 1 At 1 Ec 1
time time
(C) (D)
YY 11 aa HH 11 aa AAtt 11 aaYY 11 bbb HH 11 b AAtt 11 bb EEcc 11
Gene
duplication
S1
S3
S2
time
Gene
duplication
S1
S3
S2
time
YY 11 aa AAtt 11 aa HH 11 bb EEcc 11
S3
S2
S1
Fig. 10The evolution of species and of their genes. (a) The outer tree in black represents the evolutionary
relationships of the species, and we refer to it as thespecies tree. S1–S3 denote speciation events. The
sequence tree(syn.Gene tree) connecting the four proteins Y1, H1, At1, and Ec1 is shown in gray. As the
evolutionary lineages of the four proteins were separated by a speciation event rather than by a gene
duplication, we call themorthologs.The sequence tree connecting the four proteins is, thus, congruent to
the species tree. (b) Quite often, the reconstructed sequence tree deviates from the species tree. This can
indicate tree reconstruction artifacts resulting in sequence tree not accurately reflecting the phylogenetic
signal in the data. Alternatively, the sequences placed at unexpected positions might not be orthologous
(denoted here by the “?” that is appended to H1). (c) An idealized sequence tree of proteins which evolutionary
histories include a gene duplication (paralogs). Note that the sequence subtrees originating at the gene
duplication event each reflect the evolutionary relationships of the species that diversified after the gene
duplication event (black subtrees). (d) The minimal evolutionary scenario required to explain the gene tree in
(b). It invokes one gene duplication and three independent gene losses
134 Arpit Jain et al.