Computational Drug Discovery and Design

(backadmin) #1
on proteins can be determined by experimental approaches; how-
ever, it is challenging to identify those sites for all proteins due to
practical issues and high costs associated with experimental proce-
dures. Therefore, computational approaches have emerged for the
prediction of the functional sites (extensively reviewed in [2]). Most
of the frequently used computational approaches depend on the
information that functional sites are evolutionarily more conserved
than the rest of the protein surface. However, there are also other
sequence and structure based features that can be used to distin-
guish functional sites such as the secondary structure information,
solvent accessibility and structural conservation [3, 4]. Given that
the protein structure is more conserved than the sequence, struc-
tural comparison can recover more distant relationships across
proteins. In previous studies, a large scale comparison has been
applied to all known protein binding sites and shown that although
global structures of some protein complexes are different their
binding regions are structurally similar [2, 5, 6]. Sequence conser-
vation has been also used in combination with geometric features of
functional sites for prediction to improve the performance [7].
Determination of the conserved regions in a protein sequence
to predict functional residues starts with the multiple sequence
alignment (MSA) of the query protein sequence and its homologs
[4]. MSA reveals highly conserved positions on the input
sequences. Some methods first construct a phylogenetic tree on
the basis of the MSA results, instead of analyzing sequence conser-
vation directly from the MSA [8]. A phylogenetic tree represents
the evolutionary relationships between protein sequences, which
provides subfamily-specific mutations of protein families [9]. The
evolutionary trace (ET) method has been developed as the first
implementation of this idea, which does not use only identical
residues but also consider amino acid similarities [10]. As a kind
of more improved version of ET method, ConSurf also generates
phylogenetic trees of homologous sequences using the neighbor-
joining algorithm based on the MSA results and computes
position-specific conservation scores for each amino acid in the
sequence. Also, it retrieves structural information of proteins from
PDB if available [11, 12]. INTREPID, another functional site
prediction method, performs phylogenetic tree analysis in combi-
nation with a Jensen–Shannon divergence based positional conser-
vation score [4]. INTREPID has been extensively compared to ET
and ConSurf methods in predicting functional residues [13]. The
latest release of ET method as a database and web server is called
Universal Evolutionary Trace (UET) [14].
Apart from these methods, there are also machine learning-
based approaches such as PROFisis that identifies residues at pro-
tein–protein interaction (PPI) interfaces. This method uses PPI
information obtained from experimentally known 3D structures;
however, it does not require 3D structure of the query protein for

52 Heval Atas et al.

Free download pdf