- Train a profile hidden Markov model usinghmmbuildfrom the
HMMER package [31]:hmmbuild PRKAA1.hmm PRKAA1.
aln. - Create two directories,hmm_dirandaln_dirinsidepathTo-
HaMStR/core_orthologs/PRKAA1/. Place PRKAA1.aln
insidealn_dirandPRKAA1.hmminsidehmm_dir. - Run the HaMStR search. For every target species, perform its
own HaMStR search. An example command for a HaMStR
search would look like the following: pathToHaMStR/bin/
hamstr -sequence_file¼pathToInfile/inFile.fa -taxon¼yourTaxon-
Name -hmmset¼PRKAA1 –refspec¼HUMAN@9606 -checkCoor
thologsRef -representative(seeNote 8). - Information about the orthologs identified by HaMStR are
stored in a file with the extension “.out.” Upon opening the
file, you will find the results reported in the following format:
“proteinName|queryTaxonName|targetTaxonName|target-
ProteinId|representative|targetProteinSequence.” The field
“representative” takes either 1 or 0 as value. If HaMStR iden-
tifies more than one ortholog (co-orthologs), it will identify
the one being most similar to the seed sequence. This will be
considered the “representative ortholog” and obtains a 1. All
other co-orthologs will be flagged with a 0. If you choose the
parameter “-representative” in the HaMStR call, the output of
co-orthologs will be suppressed, and only the “representative”
ortholog will be reported. - Collect the orthologs across all species and add them to the file
AMPK_PRKAA1_Orthologs_Filtered.fa. To keep track of the
changes in the file, you may want to rename it toAMPK_PR-
KAA1_Orthologs_Filtered_extended.fa. - We recommend deleting the HaMStR output files, especially
when working with large collections of seed proteins and many
species.
Phylogenetic profiling using a targeted ortholog search
(steps 19– 21 ).
You can optionally bypass the use of precomputed ortho-
log sets and directly perform a targeted search for orthologs
with HaMStR-OneSeq [26] using a single seed sequence. - HaMStR-OneSeq is part of the HaMStR package. If you have
not already installed HaMStR, download and install the pack-
age (seestep 9in Subheading3.3). - Save the seed protein sequence in FASTA format and save it
underpathToHaMStR/data. Use this file for initiating the
ortholog search with HaMStR-OneSeq. A standard search
command looks like the following:pathToHaMStR/bin/one-
Seq.pl -sequence_file¼seedFileName.fa -seqname¼seedName
122 Arpit Jain et al.