3.4 Performance
Evaluation
Various threshold dependent and threshold independent perfor-
mance evaluation metrics can be used for judging the performance
of the machine learning algorithms (seeNote 3).
Sensitivity: This can be defined as % of correctly predicted drug
targets.
Sensitivity¼
TP
ðÞTPþFN
100 ð 1 Þ
Specificity: This can be defined as % of correctly predicted
nondrug targets.
Specificity¼
TN
ðÞTNþFP
100 ð 2 Þ
Accuracy: This can be defined as the % of correctly predicted
drug targets and nondrug targets.
Accuracy¼
TPþTN
TPþFPþTNþFN
100 ð 3 Þ
Matthews Correlation Coefficient (MCC):For binary classi-
fication problems it’s a useful performance evaluation metric. Its
values ranges from1 to +1 (worse to best).
MCC¼
ðÞTPTN ðÞFPFN
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðÞTPþFNðÞTPþFPðÞTNþFPðÞTNþFN
p ð 4 Þ
Youden’s index:This performance evaluation metric gives an
indication about the model’s ability to avoid failures. Higher values
are better.
Y¼SensitivityðÞð 1 Specificity 5 Þ
Area under the Curve (AUC):The area under the receiver
operation characteristic curves know as AUC and can be used to
summarize the ROC by a single numerical quantity. Its values
ranges from 0 to 1 and is threshold independent [39].
g -means: This is the geometric mean of sensitivity and
specificity
gmeans¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
SensitivtySpecificity
p
ð 6 Þ
3.5 Conclusion
and Future Perspective
Machine learning methods have advantage over sequence align-
ment based methods as they can take into account of the hidden
similarities between features for generating successful prediction
models. Sequence feature generation step should account to cover
as much as possible of chemical and genomic space. Protein–pro-
tein interaction data notably from databases like STRING [40],
BioGRID [41] and Human Protein Reference Databases(HPRD)
Human Drug Targets and Their Interactions 27