AMPK Methods and Protocols

(Rick Simeone) #1

  1. Reconstructing the evolutionary history of old proteins that
    extends easily over a billion years or more is tricky, as the
    phylogenetic signal in the data generally does not suffice. In
    particular, old events are frequently not accurately recon-
    structed. While in some instances such problems are indicated
    by low branch support values, this is not always the case. There
    is unfortunately no easy way out of this problem, because the
    truth is unknown. As a rule of thumb, misplaced sequences in a
    tree typically require the assumption of a plethora of additional
    evolutionary events, mostly independent gene losses, to
    explain the present-day data. Although such complex scenarios
    cannot a priori be ruled out, they should at least raise attention
    about a possible tree reconstruction artifact.

  2. It will rather often happen that you fail to fully explain the
    evolutionary history of all sequences. Typically, individual
    sequences end up in places in the tree where it simply does
    not make sense. There are several possibilities to explain this,
    among which methodological artefacts at all levels of the
    analysis-from the gene prediction in the genome to the phy-
    logeny reconstruction-prevail.

  3. For those who are not familiar with trees in Newick format—
    but for more experienced people as well—modifying Newick
    trees is painful, especially when trees become larger. One way
    to do it is with the help of Baobab, a tree visualization and
    modification software. You can modify the tree graphically and
    then export the modified tree in Newick format.

  4. Make sure that the identifiers of the sequences you upload into
    DoMosaics [30] are identical to the leaf labels of the tree. Only
    then can the tool link the information.


Acknowledgment


This work was supported by the Marie Curie ITN project
CALIPSO (GA ITN-2013 607 607), and by the Deutsche For-
schungsgesellschaft (EB 285/2-1).

References



  1. Wetterstrand KA (2016) DNA sequencing
    costs: data from the NHGRI large-scale
    genome sequencing program. http://www.genome.
    gov/sequencingcostsdata. Accessed 4 Sept.
    2016

  2. Vitulo N, Vezzi A, Romualdi C et al (2007) A
    global gene evolution analysis on Vibrionaceae
    family using phylogenetic profile. BMC Bioin-
    formatics 8(Suppl 1):S23.https://doi.org/10.
    1186/1471-2105-8-S1-S23
    3. Sun J, Xu J, Liu Z et al (2005) Refined phylo-
    genetic profiles method for predicting protein-
    protein interactions. Bioinformatics
    21:3409–3415.https://doi.org/10.1093/bio
    informatics/bti532
    4. Pellegrini M, Marcotte EM, Thompson MJ
    et al (1999) Assigning protein functions by
    comparative genome analysis: protein phyloge-
    netic profiles. Proc Natl Acad Sci U S A
    96:4285–4288. https://doi.org/10.1073/
    pnas.96.8.4285


140 Arpit Jain et al.

Free download pdf