Computational Drug Discovery and Design

(backadmin) #1
nondisease, deleterious/benign). Clicking on the circles will
again reveal a table displaying the description of the effect,
variation states, the source database where the information is
obtained from (e.g., COSMIC) and the associated disease
with clickable links to the corresponding disease databases
(e.g., OMIM). When any feature defining node/box is
clicked, the column corresponding to the location of the
feature on the sequence is highlighted all the way on the
vertical axis. This enables the user to observe if a recorded
variation is corresponding to an active site, or a structural unit
on the protein.
It is possible to observe all of the recorded features of a
protein in one place using UniProt protein pages and its
visualizer the feature viewer tool. Both the information
curated from the literature and the information imported
from other biological resources are extensively referenced. So
that, the user can check the source data repository or publica-
tion for a more detailed investigation. When taxonomic infor-
mation (i.e., organism) matches betweenPQand the observed
protein (along with conditions that the observed protein is
100% identical toPQand their sequence lengths are the same),
all of the recorded features (including variants and mutagene-
sis) are applied toPQas well, since they are basically the same
protein.
This way, the known active site information has been
observed forPQ; however, for some of the proteins this infor-
mation is incomplete. In this case, the user is advised to
continue from Subheading2.3 to complement the observed
sequence features with predictions.

2.3 Multiple
Sequence Alignment
(Clustal Omega)


In order to uncover the potential active/critical sites inPQ,a
conservation-based approach, multiple sequence alignment
(MSA), will be performed. To perform MSA forPQ, select the
similar sequences (i.e., BLAST hits) by checking the boxes at the
first column of the UniProt BLAST results. However, sequences
that are 100% identical to the query protein are often lead to
overestimation of critically important residues. Due to this reason,
these sequences will be left out. Once the sequences are selected
(includingPQ) click Align button to start the MSA using the
integrated Clustal Omega program. It is also possible to run MSA
in UniProt by entering the sequences manually on the Align inter-
face (http://www.uniprot.org/align/) or using Clustal Omega
tool interface (http://www.ebi.ac.uk/Tools/msa/clustalo/),
where the user is able to change the parameters of the tool
(seeNote 3).
The resulting page display the output MSA with site-specific
conservation information for each position in the alignment with
symbols: “*”, “:”, “.”, and “”, indicating that all residues at the

58 Heval Atas et al.

Free download pdf