will provide a good balance between results’ quality and run
times.
One important point about running MSA procedure is the
similarity between input sequences, which should neither be
too high nor too low. Results of a BLAST search may constitute
a good list for a MSA; nevertheless, 100% identical sequences
to the BLAST query should be removed from the list. How-
ever, this operation cannot remove the sequences among the
BLAST results that are identical to each other (i.e., not identi-
cal to the BLAST query sequence). To solve this problem,
BLAST operation can be carried out using UniRef90 instead
of UniProtKB, as the target database. This way, the results will
include only one sequence from each 90% similarity cluster.
Afterward, MSA can be run with these BLAST results with
default parameters. This alternative operation is especially
recommended when BLAST returns many highly identical
sequences when ran against UniProtKB database.
- In addition to conservation predictions, both Consurf and
TraceSuite II label residues based on whether they are buried
or exposed. However, this information does not tell about if
the buried residue is located on the protein surface or in the
core region. Functional site residues are located in the surface
of the protein and have tendency to be buried. Residues that
are important for the stability of the protein are located in the
core region. To discriminate if the residue is located in the core
or in the surface region, 3D structure information is necessary
and the accessible surface area calculations need to be per-
formed. The sequence based version of ConSurf titled: “Con-
Seq” aims to obtain a distinction between structural and
functional residues (apart from exposed vs. buried) using neu-
ral networks; however, the performance of this type of predic-
tion was reported to be relatively low [22].
- Residue numbers in the output of TraceSuite II is labelled
according to the partitioned MSA results, which may not rep-
resent the correct residue positions of the query protein. In this
case, user needs to traceback and relabel the residue numbers
accordingly.
- One limitation of Consurf is that it does not run for less than
50 homologous sequences.
- It is possible to run the PROFisis tool without registering and
logging in using this link: https://ppopen.informatik.tu-
muenchen.de/. However, in this mode it is not possible to
store the results of the analysis. In some cases PROFisis inaccu-
rately displays the positions of the predicted binding sites on
the graphical results, it is advised to check the precise positions
of the predicted sites either from the text based output or by
dragging the mouse cursor over the binding site representing
diamond shaped nodes on the graphical output.
Phylogenetics-Based Prediction of Functional Sites 67