Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

So far we have not taken costs into account, or rather we have used the default
cost matrix in which all errors cost the same. Cost curves, which do take cost
into account, look very similar—very similar indeed—but the axes are differ-
ent. Figure 5.4(b) shows a cost curve for the same classifier A (note that the ver-
tical scale has been enlarged, for convenience, and ignore the gray lines for now).
It plots the expected cost of using A against the probability cost function,which
is a distorted version ofp[+] that retains the same extremes: zero when p[+] =
0 and one when p[+] =1. Denote by C[+|-] the cost of predicting +when the
instance is actually –, and the reverse by C[-|+]. Then the axes of Figure 5.4(b)
are


We are assuming here that correct predictions have no cost:C[+|+] =C[-|-] =



  1. If that is not the case the formulas are a little more complex.
    The maximum value that the normalized expected cost can have is 1—that
    is why it is “normalized.” One nice thing about cost curves is that the extreme


Normalized expected cost

Probability cost function

=¥+[]+¥-+( [])

[]+=

[]+ []+-
[]+ []+-+-[][]-+

fn p fp p

p

pC
pC pC

CC

C

1

.

5.7 COUNTING THE COST 175


probability cost function pC [+]

normalized
expected
cost

fn

fp

0

0

0.25

0.5

0.5 1

A


B


(b)
Figure 5.4(continued)

Free download pdf