Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

different outcomes in the cost matrix can be estimated—say, using cross-
validation—it is straightforward to perform this kind of financial analysis.
Given a cost matrix, you can calculate the cost of a particular learned model
on a given test set just by summing the relevant elements of the cost matrix for
the model’s prediction for each test instance. Here, the costs are ignored when
making predictions, but taken into account when evaluating them.
If the model outputs the probability associated with each prediction, it can
be adjusted to minimize the expected cost of the predictions. Given a set of pre-
dicted probabilities for each outcome on a certain test instance, one normally
selects the most likely outcome. Instead, the model could predict the class with
the smallest expected misclassification cost. For example, suppose in a three-
class situation the model assigns the classes a, b,and cto a test instance with
probabilities pa,pb,and pc,and the cost matrix is that in Table 5.5(b). If it pre-
dicts a, the expected cost of the prediction is obtained by multiplying the first
column of the matrix, [0,1,1], by the probability vector, [pa,pb,pc], yielding
pb+pcor 1 -pabecause the three probabilities sum to 1. Similarly, the costs for
predicting the other two classes are 1 -pband 1 -pc.For this cost matrix, choos-
ing the prediction with the lowest expected cost is the same as choosing the one
with the greatest probability. For a different cost matrix it might be different.
We have assumed that the learning method outputs probabilities, as Naïve
Bayes does. Even if they do not normally output probabilities, most classifiers
can easily be adapted to compute them. In a decision tree, for example, the prob-
ability distribution for a test instance is just the distribution of classes at the
corresponding leaf.


Cost-sensitive learning

We have seen how a classifier, built without taking costs into consideration, can
be used to make predictions that are sensitive to the cost matrix. In this case,
costs are ignored at training time but used at prediction time. An alternative is
to do just the opposite: take the cost matrix into account during the training
process and ignore costs at prediction time. In principle, better performance
might be obtained if the classifier were tailored by the learning algorithm to the
cost matrix.
In the two-class situation, there is a simple and general way to make any
learning method cost sensitive. The idea is to generate training data with a dif-
ferent proportion ofyesand noinstances. Suppose that you artificially increase
the number ofnoinstances by a factor of 10 and use the resulting dataset for
training. If the learning scheme is striving to minimize the number of errors, it
will come up with a decision structure that is biased toward avoiding errors on
the noinstances, because such errors are effectively penalized 10-fold. If data


5.7 COUNTING THE COST 165

Free download pdf