Computational Drug Discovery and Design

(backadmin) #1

2.2 Databases
for Creating
the Dataset


A number of databases are now available consisting of specialized
information about drug and their targets. For example DrugBank
[19] consists of information about drug–target relationships; like-
wise Matador [20] also consists of direct and indirect drug–target
relationships. Similarly SuperTarget [20] and Therapeutic Target
Database (TTD) [21] also have high quality information about
drug–target relationships. Along with the drug–target relationship
data, Integrity [22] provides associated disease information also.
Potential Drug Target Database (PDTD) [23] augments the infor-
mation of drug–target relationship with structural data of the target
while BindingDB [24] is one of the major databases consisting of
experimentally derived protein–ligand binding affinities.

2.3 Tools
and Servers
for the Calculation
of Features


A number of stand-alone programs and web servers are available
which can be used to generate a variety of features of protein targets
and non targets. PROFEAT [25, 26] is one of the oldest web
servers capable of calculating structural and physicochemical prop-
erties from proteins sequences. PseAAC-builder [27] which is a
stand-alone program and PseAAC [28] a web server are dedicated
to the generation of various modes of pseudo amino acid composi-
tion [29]. Pse-in-One [30] which is a web server providing services
for the calculation of pseudo components for proteins as well as
nucleic acids. Complimentary to other programs ProtDcal [31] can
also be used for generating a number of numerical descriptors from
protein sequences as well as from 3D structures. Apart from the
above mentioned stand-alone programs and web servers, propy
[32] and protr [33], a python and an R package respectively may
be implemented for the calculation of a large number of attributes
as per the need of the problem. Various types of molecular descrip-
tors of molecular compounds can also be easily computed using
web servers like PaDEL [34], MODEL [35], and Mold2
[36]. These calculated features can be used as features in developing
prediction models for drug–target interactions.

3 Methods


3.1 Dataset Creation For any supervised learning algorithm there should be a labeled
dataset, i.e., data instances along with their classes. The foremost
requirement for training a supervised learning algorithm is the
availability of a benchmark dataset having proper representation
of the various classes (in case of binary classification—positive and
negative classes), but seldom it is so. A dataset is said to be imbal-
anced when the number of data points (instances/examples)
belonging to a particular class overwhelms the number of data
points of the other class. In the case of human drug target predic-
tion the number of instances belonging to the drug target is less as
compared to the non targets. In such cases the machine learning


Human Drug Targets and Their Interactions 25
Free download pdf