Computational Drug Discovery and Design

(backadmin) #1
size of results, while less significant hits can be discarded by
unchecking their boxes, before the MSA procedure. The third
option is the selection of the similarity matrix. A similarity
matrix assigns scores to each amino acid pair combination,
proportional to the probability of observing the corresponding
substitution in the nature. BLOSUM62 is the generally
accepted matrix to detect weak similarities between protein
sequences. BLOSUM80 performs better in finding highly sim-
ilar proteins and BLOSUM45 is a better choice to detect highly
distant sequences. Auto option can be selected in order for the
algorithm to decide for the best matrix for the query sequence.
The fourth parameter option is for filtering low complexity
regions in proteins, to avoid false matches specific to those
regions. If the query protein is known to include low complex-
ity regions, this filter can be selected to obtain more specific
results. The fifth option is for switching the allowance of gaps,
where the default selection is “yes.” The last option is for
limiting the number of returned hits. Usually, 250 hits or less
is sufficient to build a MSA.


  1. The sources of protein annotations in UniProt include: (1) -
    in-house manual curation, (2) in-house automated predictions,
    and (3) imports from external resources. This way, UniProt
    aims to provide comprehensive information on the properties
    of proteins. However, included data does not cover the results
    of most of the external predictive approaches available in the
    literature. Due to this reason, it is natural to observe differences
    between the information in UniProt and the results of a pre-
    diction method (e.g., functional site information obtained
    from different resources in our case study). UniProt aims to
    incorporate only the most reliable annotations backed up by
    strong evidence; as a result, a predictive approach may provide
    relatively higher coverage on a query protein, where a portion
    of its predictions can also be false positives. Nevertheless, pre-
    dicted information obtained from other resources can be eval-
    uated along with UniProt annotations. For example, predicted
    active residues can be compared to mutagenesis and variation
    information provided in UniProt protein pages, which can be
    utilized to infer disease relations and for drug targeting.

  2. The parameters of Clustal Omega are as follows: The first
    option is the selection of the input data as protein, DNA or
    RNA. The second one is the output format, which can be
    selected among Clustal, Pearson, MSF, PHYLIP and etc.
    “More options” button will reveal the parameters about the
    tool such as the number of iterations at different steps of the
    algorithm (e.g., sequence alignment and tree generation).
    Higher number of iterations will refine the results with the
    cost of longer computation times. Usually the default values


66 Heval Atas et al.

Free download pdf