Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

So far we have not taken costs into account, or rather we have used the default
cost matrix in which all errors cost the same. Cost curves, which do take cost
into account, look very similar—very similar indeed—but the axes are differ-
ent. Figure 5.4(b) shows a cost curve for the same classifier A (note that the ver-
tical scale has been enlarged, for convenience, and ignore the gray lines for now).
It plots the expected cost of using A against the probability cost function,which
is a distorted version ofp[+] that retains the same extremes: zero when p[+] =
0 and one when p[+] =1. Denote by C[+|-] the cost of predicting +when the
instance is actually –, and the reverse by C[-|+]. Then the axes of Figure 5.4(b)
are

We are assuming here that correct predictions have no cost:C[+|+] =C[-|-] =

If that is not the case the formulas are a little more complex.
The maximum value that the normalized expected cost can have is 1—that
is why it is “normalized.” One nice thing about cost curves is that the extreme

Normalized expected cost

Probability cost function

=¥+[]+¥-+( [])

[]+=

[]+ []+- []+ []+-+-[][]-+

fn p fp p

p

pC pC pC

CC

C

1

.

5.7 COUNTING THE COST 175

probability cost function pC [+]

normalized expected cost

fn

fp

0

0.25

0.5

0.5 1

A

B

(b)
Figure 5.4(continued)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Get our desktop app

Company

Features

Documentation

Resources