don’t be too sparse in your taxon sampling, as otherwise evo-
lutionary conclusions depend too heavily on individual species
and the quality of their genome reconstruction (seeNote 5).
- For a start, you can use the collection of currently 78 species
represented in the Quest for Orthologs (QfO) reference prote-
ome collection that is provided with the HaMStR package.
These species cover the full diversity of the tree of life. How-
ever, due to the considerably low number of species, the reso-
lution of individual phylogenetic lineages is not great.
3.3 Phylogenetic
Profiling
Search for orthologs to your pathway components. End with a
phylogenetic profile for every gene informing about the pres-
ence/absence of orthologs across your collection of target species.
Retrieve orthologs directly from an online database such as OMA
[23] or OrthoDB [24]. This provides a quick access to precom-
puted results and reduces computations to a minimum. Insteps
1 – 5 , we explain the ortholog retrieval for individual seed sequences
via the web interfaces of the OMA database. For the use of
OrthoDB, directly jump tostep 6.
If you prefer OMA over OrthoDB, proceed as follows:
- Go to the OMA homepage (http://omabrowser.org/oma/
home/). - Query for orthologs providing one of the following informa-
tion as input into the search field: (1) OMA sequence identifier,
(2) the protein sequence, (3) the OMA group identifier, or
(4) a keyword. The sequence identifiers in OMA have a specific
format. For example, human PRKAA1 protein has the OMA
identifier HUMAN23295, where “HUMAN” is the species
identifier for Homo sapiens and “23295” corresponds to
sequence identifier in the OMA human gene set database. - When the search was performed with an OMA identifier, an
output page is obtained providing detailed information about
the seed protein. Alternatively, when the search was initiated
with a protein sequence, you may have to follow the “Entry”
link representing your query organism (“Homo sapiens” in this
case). This will result in a page with all information related to
the seed protein. - Follow the link “Orthologs.” This will show all the orthologs
to the seed protein. Download the sequences in FASTA format.
Provide this file some meaningful name. For example, if you
search for orthologs to PRKAA1, choose AMPK_PR-
KAA1_Orthologs.faas file name. - Filter the orthologs file and retain only the sequences from
your target species. Save this modified file with a new name like
AMPK_PRKAA1_Orthologs_Filtered.fa.
If you prefer OrthoDB over OMA, proceed as follows:
120 Arpit Jain et al.