Computational Drug Discovery and Design

(backadmin) #1
[42], subcellular location, knowledge of functional domains, gene
expression profiles can be incorporated as useful features. A variety
of network properties (like betweenness and connectivity) can also
be calculated which can be used as features along with other
sequence and structure based features. Integrating different types
of attributes/features should result in more accurate predictors.
Most of the machine learning techniques are black-box prediction
methods which does not allow to interpret the relationship between
attributes used in developing the classification methods, rule based
prediction methods can compensate for this shortcoming. It is
imperative to accurately develop computational methods for pre-
dicting drug targets which is one of the preliminary stages of drug
development pipelines which will facilitate to reap benefits in the
later stages of the pipeline.
The human drug target space is enormous and a prediction
system which can sensitively predict drug targets is needed.
Machine learning methods can augment wet lab methods. With
the advent of big data technologies, the vast chemical and genomic
space could be mined more successfully and efficiently. As develop-
ment in the field of machine learning continues to grow, the future
of prediction of drug targets and their interaction is promising.

4 Notes



  1. The human drug target classification presents a case of imbal-
    ance data problem where the number of human drug targets is
    far less than the nondrug targets. Using an algorithm which can
    take into account data imbalance will be more suitable, alterna-
    tively datasets can be balanced using SMOTE and its variants.

  2. In some cases it is useful to standardize/normalize the features
    for better learning for certain machine learning algorithms.

  3. There are two major ways to evaluate the performance of
    machine learning algorithms: K fold cross-validation and train-
    ing/testing set split based evaluation. The two common ver-
    sions of cross-validation are the tenfold cross-validation and
    fivefold cross-validation. Leave one out cross-validation
    (LOOCV) is a special case of K fold cross-validation where
    K¼n (total number of samples in the dataset). In K fold
    cross-validation, the dataset is broken into K subsets, then
    each subset is used once as a testing set while the remaining
    (K-1) subsets are used for building the prediction model. In
    LOOCVeach sample is kept once for testing while the model is
    built upon the rest of the samples. If the data is plentiful, then
    both tenfold cross-validation and train/test split methods of
    evaluation should be used.


28 Abhigyan Nath et al.

Free download pdf