Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

Different terms are used in different domains. Medics, for example, talk about
the sensitivityand specificityof diagnostic tests. Sensitivity refers to the propor-
tion of people with disease who have a positive test result, that is,tp.Specificity
refers to the proportion of people without disease who have a negative test
result, which is 1 -fp.Sometimes the product of these is used as an overall
measure:


Finally, of course, there is our old friend the success rate:


To summarize ROC curves in a single quantity, people sometimes use the area
under the curve (AUC) because, roughly speaking the larger the area the better
the model. The area also has a nice interpretation as the probability that the
classifier ranks a randomly chosen positive instance above a randomly chosen
negative one. Although such measures may be useful if costs and class distri-
butions are unknown and one method must be chosen to handle all situations,
no single number is able to capture the tradeoff. That can only be done by
two-dimensional depictions such as lift charts, ROC curves, and recall–preci-
sion diagrams.


Cost curves

ROC curves and their relatives are very useful for exploring the tradeoffs among
different classifiers over a range of costs. However, they are not ideal for evalu-
ating machine learning models in situations with known error costs. For
example, it is not easy to read off the expected cost of a classifier for a fixed cost
matrix and class distribution. Neither can you easily determine the ranges of
applicability of different classifiers. For example, from the crossover point
between the two ROC curves in Figure 5.3 it is hard to tell for what cost and
class distributions classifier A outperforms classifier B.
Cost curvesare a different kind of display on which a single classifier corre-
sponds to a straight line that shows how the performance varies as the class dis-
tribution changes. Again, they work best in the two-class case, although you can
always make a multiclass problem into a two-class one by singling out one class
and evaluating it against the remaining ones.
Figure 5.4(a) plots the expected error against the probability of one of the
classes. You could imagine adjusting this probability by resampling the test set
in a nonuniform way. We denote the two classes using +and -. The diagonals
show the performance of two extreme classifiers: one always predicts +, giving


TP TN
TP FP TN FN

+
+++

.

sensitivity specificity

TP TN
TP FN FP TN

¥=-( )=


( + )◊+( )

tp 1 fp

5.7 COUNTING THE COST 173

Free download pdf