- Reconstructing the evolutionary history of old proteins that
 extends easily over a billion years or more is tricky, as the
 phylogenetic signal in the data generally does not suffice. In
 particular, old events are frequently not accurately recon-
 structed. While in some instances such problems are indicated
 by low branch support values, this is not always the case. There
 is unfortunately no easy way out of this problem, because the
 truth is unknown. As a rule of thumb, misplaced sequences in a
 tree typically require the assumption of a plethora of additional
 evolutionary events, mostly independent gene losses, to
 explain the present-day data. Although such complex scenarios
 cannot a priori be ruled out, they should at least raise attention
 about a possible tree reconstruction artifact.
- It will rather often happen that you fail to fully explain the
 evolutionary history of all sequences. Typically, individual
 sequences end up in places in the tree where it simply does
 not make sense. There are several possibilities to explain this,
 among which methodological artefacts at all levels of the
 analysis-from the gene prediction in the genome to the phy-
 logeny reconstruction-prevail.
- For those who are not familiar with trees in Newick format—
 but for more experienced people as well—modifying Newick
 trees is painful, especially when trees become larger. One way
 to do it is with the help of Baobab, a tree visualization and
 modification software. You can modify the tree graphically and
 then export the modified tree in Newick format.
- Make sure that the identifiers of the sequences you upload into
 DoMosaics [30] are identical to the leaf labels of the tree. Only
 then can the tool link the information.
Acknowledgment
This work was supported by the Marie Curie ITN project
CALIPSO (GA ITN-2013 607 607), and by the Deutsche For-
schungsgesellschaft (EB 285/2-1).References
- Wetterstrand KA (2016) DNA sequencing
 costs: data from the NHGRI large-scale
 genome sequencing program. http://www.genome.
 gov/sequencingcostsdata. Accessed 4 Sept.
 2016
- Vitulo N, Vezzi A, Romualdi C et al (2007) A
 global gene evolution analysis on Vibrionaceae
 family using phylogenetic profile. BMC Bioin-
 formatics 8(Suppl 1):S23.https://doi.org/10.
 1186/1471-2105-8-S1-S23
 3. Sun J, Xu J, Liu Z et al (2005) Refined phylo-
 genetic profiles method for predicting protein-
 protein interactions. Bioinformatics
 21:3409–3415.https://doi.org/10.1093/bio
 informatics/bti532
 4. Pellegrini M, Marcotte EM, Thompson MJ
 et al (1999) Assigning protein functions by
 comparative genome analysis: protein phyloge-
 netic profiles. Proc Natl Acad Sci U S A
 96:4285–4288. https://doi.org/10.1073/
 pnas.96.8.4285
140 Arpit Jain et al.
