- Reconstructing the evolutionary history of old proteins that
extends easily over a billion years or more is tricky, as the
phylogenetic signal in the data generally does not suffice. In
particular, old events are frequently not accurately recon-
structed. While in some instances such problems are indicated
by low branch support values, this is not always the case. There
is unfortunately no easy way out of this problem, because the
truth is unknown. As a rule of thumb, misplaced sequences in a
tree typically require the assumption of a plethora of additional
evolutionary events, mostly independent gene losses, to
explain the present-day data. Although such complex scenarios
cannot a priori be ruled out, they should at least raise attention
about a possible tree reconstruction artifact. - It will rather often happen that you fail to fully explain the
evolutionary history of all sequences. Typically, individual
sequences end up in places in the tree where it simply does
not make sense. There are several possibilities to explain this,
among which methodological artefacts at all levels of the
analysis-from the gene prediction in the genome to the phy-
logeny reconstruction-prevail. - For those who are not familiar with trees in Newick format—
but for more experienced people as well—modifying Newick
trees is painful, especially when trees become larger. One way
to do it is with the help of Baobab, a tree visualization and
modification software. You can modify the tree graphically and
then export the modified tree in Newick format. - Make sure that the identifiers of the sequences you upload into
DoMosaics [30] are identical to the leaf labels of the tree. Only
then can the tool link the information.
Acknowledgment
This work was supported by the Marie Curie ITN project
CALIPSO (GA ITN-2013 607 607), and by the Deutsche For-
schungsgesellschaft (EB 285/2-1).
References
- Wetterstrand KA (2016) DNA sequencing
costs: data from the NHGRI large-scale
genome sequencing program. http://www.genome.
gov/sequencingcostsdata. Accessed 4 Sept.
2016 - Vitulo N, Vezzi A, Romualdi C et al (2007) A
global gene evolution analysis on Vibrionaceae
family using phylogenetic profile. BMC Bioin-
formatics 8(Suppl 1):S23.https://doi.org/10.
1186/1471-2105-8-S1-S23
3. Sun J, Xu J, Liu Z et al (2005) Refined phylo-
genetic profiles method for predicting protein-
protein interactions. Bioinformatics
21:3409–3415.https://doi.org/10.1093/bio
informatics/bti532
4. Pellegrini M, Marcotte EM, Thompson MJ
et al (1999) Assigning protein functions by
comparative genome analysis: protein phyloge-
netic profiles. Proc Natl Acad Sci U S A
96:4285–4288. https://doi.org/10.1073/
pnas.96.8.4285
140 Arpit Jain et al.