So far we have not taken costs into account, or rather we have used the default
cost matrix in which all errors cost the same. Cost curves, which do take cost
into account, look very similar—very similar indeed—but the axes are differ-
ent. Figure 5.4(b) shows a cost curve for the same classifier A (note that the ver-
tical scale has been enlarged, for convenience, and ignore the gray lines for now).
It plots the expected cost of using A against the probability cost function,which
is a distorted version ofp[+] that retains the same extremes: zero when p[+] =
0 and one when p[+] =1. Denote by C[+|-] the cost of predicting +when the
instance is actually –, and the reverse by C[-|+]. Then the axes of Figure 5.4(b)
are
We are assuming here that correct predictions have no cost:C[+|+] =C[-|-] =
- If that is not the case the formulas are a little more complex.
The maximum value that the normalized expected cost can have is 1—that
is why it is “normalized.” One nice thing about cost curves is that the extreme
Normalized expected cost
Probability cost function
=¥+[]+¥-+( [])
[]+=
[]+ []+-
[]+ []+-+-[][]-+
fn p fp p
p
pC
pC pC
CC
C
1
.
5.7 COUNTING THE COST 175
probability cost function pC [+]
normalized
expected
cost
fn
fp
0
0
0.25
0.5
0.5 1
A
B
(b)
Figure 5.4(continued)