Computational Drug Discovery and Design

[42], subcellular location, knowledge of functional domains, gene expression profiles can be incorporated as useful features. A variety of network properties (like betweenness and connectivity) can also be calculated which can be used as features along with other sequence and structure based features. Integrating different types of attributes/features should result in more accurate predictors. Most of the machine learning techniques are black-box prediction methods which does not allow to interpret the relationship between attributes used in developing the classification methods, rule based prediction methods can compensate for this shortcoming. It is imperative to accurately develop computational methods for pre- dicting drug targets which is one of the preliminary stages of drug development pipelines which will facilitate to reap benefits in the later stages of the pipeline. The human drug target space is enormous and a prediction system which can sensitively predict drug targets is needed. Machine learning methods can augment wet lab methods. With the advent of big data technologies, the vast chemical and genomic space could be mined more successfully and efficiently. As development in the field of machine learning continues to grow, the future of prediction of drug targets and their interaction is promising.

4 Notes

The human drug target classification presents a case of imbal-
ance data problem where the number of human drug targets is
far less than the nondrug targets. Using an algorithm which can
take into account data imbalance will be more suitable, alterna-
tively datasets can be balanced using SMOTE and its variants.

In some cases it is useful to standardize/normalize the features
for better learning for certain machine learning algorithms.

There are two major ways to evaluate the performance of
machine learning algorithms: K fold cross-validation and train-
ing/testing set split based evaluation. The two common ver-
sions of cross-validation are the tenfold cross-validation and
fivefold cross-validation. Leave one out cross-validation
(LOOCV) is a special case of K fold cross-validation where
K¼n (total number of samples in the dataset). In K fold
cross-validation, the dataset is broken into K subsets, then
each subset is used once as a testing set while the remaining
(K-1) subsets are used for building the prediction model. In
LOOCVeach sample is kept once for testing while the model is
built upon the rest of the samples. If the data is plentiful, then
both tenfold cross-validation and train/test split methods of
evaluation should be used.

28 Abhigyan Nath et al.

Computational Drug Discovery and Design

Get our desktop app

Company

Features

Documentation

Resources