Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

of positives included in the sample on the vertical axis, expressed as a percent-
age of the total number of positives, against the number of negatives included
in the sample, expressed as a percentage of the total number of negatives, on
the horizontal axis. The vertical axis is the same as that of the lift chart except
that it is expressed as a percentage. The horizontal axis is slightly different—
number of negatives rather than sample size. However, in direct marketing sit-
uations in which the proportion of positives is very small anyway (like 0.1%),
there is negligible difference between the size of a sample and the number of
negatives it contains, so the ROC curve and lift chart look very similar. As with
lift charts, the northwest corner is the place to be.
Figure 5.2 shows an example ROC curve—the jagged line—for the sample of
test data in Table 5.6. You can follow it along with the table. From the origin,
go up two (two positives), along one (one negative), up five (five positives),
along one (one negative), up one, along one, up two, and so on. Each point cor-
responds to drawing a line at a certain position on the ranked list, counting the
yes’s and no’s above it, and plotting them vertically and horizontally, respectively.
As you go farther down the list, corresponding to a larger sample, the number
of positives and negatives both increase.
The jagged ROC line in Figure 5.2 depends intimately on the details of the
particular sample of test data. This sample dependence can be reduced by apply-
ing cross-validation. For each different number ofno’s—that is, each position
along the horizontal axis—take just enough of the highest-ranked instances to
include that number ofno’s, and count the number ofyes’s they contain. Finally,
average that number over different folds of the cross-validation. The result is a


5.7 COUNTING THE COST 169


true positives


100%

80%

60%

40%

20%

0

false positives

0 20% 40% 60% 80% 100%

Figure 5.2A sample ROC curve.

Free download pdf