Computational Drug Discovery and Design

3.4 Performance
Evaluation

Various threshold dependent and threshold independent performance evaluation metrics can be used for judging the performance of the machine learning algorithms (seeNote 3). Sensitivity: This can be defined as % of correctly predicted drug targets.

Sensitivity¼

TP ðÞTPþFN

100 ð 1 Þ

Specificity: This can be defined as % of correctly predicted nondrug targets.

Specificity¼

TN ðÞTNþFP

100 ð 2 Þ

Accuracy: This can be defined as the % of correctly predicted drug targets and nondrug targets.

Accuracy¼

TPþTN TPþFPþTNþFN

100 ð 3 Þ

Matthews Correlation Coefficient (MCC):For binary classi- fication problems it’s a useful performance evaluation metric. Its values ranges from1 to +1 (worse to best).

MCC¼

ðÞTPTN ðÞFPFN ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðÞTPþFNðÞTPþFPðÞTNþFPðÞTNþFN

p ð 4 Þ

Youden’s index:This performance evaluation metric gives an indication about the model’s ability to avoid failures. Higher values are better.

Y¼SensitivityðÞð 1 Specificity 5 Þ

Area under the Curve (AUC):The area under the receiver operation characteristic curves know as AUC and can be used to summarize the ROC by a single numerical quantity. Its values ranges from 0 to 1 and is threshold independent [39]. g -means: This is the geometric mean of sensitivity and specificity

gmeans¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi SensitivtySpecificity

p ð 6 Þ

3.5 Conclusion
and Future Perspective

Machine learning methods have advantage over sequence align- ment based methods as they can take into account of the hidden similarities between features for generating successful prediction models. Sequence feature generation step should account to cover as much as possible of chemical and genomic space. Protein–protein interaction data notably from databases like STRING [40], BioGRID [41] and Human Protein Reference Databases(HPRD)

Human Drug Targets and Their Interactions 27

Computational Drug Discovery and Design

Get our desktop app

Company

Features

Documentation

Resources