Computational Drug Discovery and Design

available in WKEA, KNIME, and Rapidminer which makes it easy to experiment with a large number data processing and machine learning algorithms. One of the common formats consisting of feature values for a group of instances, which is almost supported by all the above- mentioned machine-learning platforms, is the Attribute-Relation File Format (ARFF). The ARFF file consists of header and data part. The header part consists of title, name of the relation, a list of attributes (features) with their types and the data part consists of the values of the calculated features with the class information for each instance. In Fig.3 we have presented dummy ARFF file consisting of features for two classes—drug targets and nontargets with four features (Molecular weight, mean hydrophobicity, aro- matic amino acid composition, and charged amino acid composition).

Dataset consisting of known drug targets and non drug targets

Fixed length representation of the protein sequences by calculating measurable properties such as amino acid composition, dipeptide composition, physicochemical properties, pseudo amino acid composition etc.

Training using machine learning algorithms

Selection of best prediction model

Performance evaluation using K-fold cross validation or Leave one out cross validation or by training/testing set split

Fig. 1Schematic representation of the steps for drug target prediction

Human Drug Targets and Their Interactions 23

Computational Drug Discovery and Design

Get our desktop app

Company

Features

Documentation

Resources