Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
refer to recall for the top ten documents, that is, 8/40 =5%; while “precision at
10” would be 8/10 =80%. Information retrieval experts use recall–precision
curvesthat plot one against the other, for different numbers of retrieved docu-
ments, in just the same way as ROC curves and lift charts—except that because
the axes are different, the curves are hyperbolic in shape and the desired oper-
ating point is toward the upper right.

Discussion

Table 5.7 summarizes the three different ways we have met of evaluating the
same basic tradeoff; TP, FP, TN, and FN are the number of true positives, false
positives, true negatives, and false negatives, respectively. You want to choose a
set of instances with a high proportion ofyesinstances and a high coverage of
the yesinstances: you can increase the proportion by (conservatively) using a
smaller coverage, or (liberally) increase the coverage at the expense of the pro-
portion. Different techniques give different tradeoffs, and can be plotted as dif-
ferent lines on any of these graphical charts.
People also seek single measures that characterize performance. Two that are
used in information retrieval are 3-point average recall,which gives the average
precision obtained at recall values of 20%, 50%, and 80%, and 11-point average
recall,which gives the average precision obtained at recall values of 0%, 10%,
20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, and 100%. Also used in informa-
tion retrieval is the F-measure,which is:

22 ¥¥
+

=


◊++

recall precision
recall precision

TP
2TP FP FN

172 CHAPTER 5| CREDIBILITY: EVALUATING WHAT’S BEEN LEARNED


Table 5.7 Different measures used to evaluate the false positive versus the false
negative tradeoff.

Domain Plot Axes Explanation of axes

lift chart marketing TP vs. TP number of true positives
subset size
subset size


ROC curve communications TP rate vs. TP rate
FP rate
FP rate


recall–precision information recall vs. recall same as TP rate tp
curve retrieval precision
precision TP
TP FP+


¥100%

fp=
+

FP ¥
FP TN

100%

tp=
+

TP ¥
TP FN

100%

TP FP
TP FP TN FN

+
+++

¥100%
Free download pdf