Computational Drug Discovery and Design

2.2 Databases
for Creating
the Dataset

A number of databases are now available consisting of specialized information about drug and their targets. For example DrugBank [19] consists of information about drug–target relationships; like- wise Matador [20] also consists of direct and indirect drug–target relationships. Similarly SuperTarget [20] and Therapeutic Target Database (TTD) [21] also have high quality information about drug–target relationships. Along with the drug–target relationship data, Integrity [22] provides associated disease information also. Potential Drug Target Database (PDTD) [23] augments the information of drug–target relationship with structural data of the target while BindingDB [24] is one of the major databases consisting of experimentally derived protein–ligand binding affinities.

2.3 Tools
and Servers
for the Calculation
of Features

A number of stand-alone programs and web servers are available which can be used to generate a variety of features of protein targets and non targets. PROFEAT [25, 26] is one of the oldest web servers capable of calculating structural and physicochemical prop- erties from proteins sequences. PseAAC-builder [27] which is a stand-alone program and PseAAC [28] a web server are dedicated to the generation of various modes of pseudo amino acid composi- tion [29]. Pse-in-One [30] which is a web server providing services for the calculation of pseudo components for proteins as well as nucleic acids. Complimentary to other programs ProtDcal [31] can also be used for generating a number of numerical descriptors from protein sequences as well as from 3D structures. Apart from the above mentioned stand-alone programs and web servers, propy [32] and protr [33], a python and an R package respectively may be implemented for the calculation of a large number of attributes as per the need of the problem. Various types of molecular descriptors of molecular compounds can also be easily computed using web servers like PaDEL [34], MODEL [35], and Mold2 [36]. These calculated features can be used as features in developing prediction models for drug–target interactions.

3 Methods

3.1 Dataset Creation For any supervised learning algorithm there should be a labeled
dataset, i.e., data instances along with their classes. The foremost
requirement for training a supervised learning algorithm is the
availability of a benchmark dataset having proper representation
of the various classes (in case of binary classification—positive and
negative classes), but seldom it is so. A dataset is said to be imbal-
anced when the number of data points (instances/examples)
belonging to a particular class overwhelms the number of data
points of the other class. In the case of human drug target predic-
tion the number of instances belonging to the drug target is less as
compared to the non targets. In such cases the machine learning

Human Drug Targets and Their Interactions 25

Computational Drug Discovery and Design

Get our desktop app

Company

Features

Documentation

Resources