Computational Drug Discovery and Design

(backadmin) #1
available in WKEA, KNIME, and Rapidminer which makes it easy
to experiment with a large number data processing and machine
learning algorithms.
One of the common formats consisting of feature values for a
group of instances, which is almost supported by all the above-
mentioned machine-learning platforms, is the Attribute-Relation
File Format (ARFF). The ARFF file consists of header and data
part. The header part consists of title, name of the relation, a list of
attributes (features) with their types and the data part consists of
the values of the calculated features with the class information for
each instance. In Fig.3 we have presented dummy ARFF file
consisting of features for two classes—drug targets and nontargets
with four features (Molecular weight, mean hydrophobicity, aro-
matic amino acid composition, and charged amino acid
composition).

Dataset consisting of
known drug targets
and non drug targets

Fixed length representation of the protein sequences by
calculating measurable properties such as amino acid
composition, dipeptide composition, physicochemical
properties, pseudo amino acid composition etc.

Training using machine
learning algorithms

Selection of best
prediction model

Performance evaluation using K-fold cross
validation or Leave one out cross validation
or by training/testing set split

Fig. 1Schematic representation of the steps for drug target prediction


Human Drug Targets and Their Interactions 23
Free download pdf