Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

7.5 COMBINING MULTIPLE MODELS 335


However, we do not have to use the particular code words shown. Indeed,
there is no reason why each class must be represented by 4 bits. Look instead at
the code of Table 7.1(b), where classes are represented by 7 bits. When applied
to a dataset, seven classifiers must be built instead of four. To see what that might
buy, consider the classification of a particular instance. Suppose it belongs to
class a,and that the predictions of the individual classifiers are 1 0 1 1 1 1 1
(respectively). Obviously, comparing this code word with those in Table 7.1(b),
the second classifier has made a mistake: it predicted 0 instead of 1,noinstead
ofyes.However, comparing the predicted bits with the code word associated
with each class, the instance is clearly closer to athan to any other class. This
can be quantified by the number of bits that must be changed to convert the
predicted code word into those of Table 7.1(b): the Hamming distance,or the
discrepancy between the bit strings, is 1, 3, 3, and 5 for the classes a, b, c,and d,
respectively. We can safely conclude that the second classifier made a mistake
and correctly identify aas the instance’s true class.
The same kind of error correction is not possible with the code words of Table
7.1(a), because any predicted string of 4 bits other than these four 4-bit words
has the same distance to at least two of them. The output codes are not “error
correcting.”
What determines whether a code is error correcting or not? Consider the
Hamming distance between the code words representing different classes. The
number of errors that can possibly be corrected depends on the minimum dis-
tance between any pair of code words, say d.The code can guarantee to correct
up to (d -1)/2 1-bit errors, because if this number of bits of the correct
code word are flipped, it will still be the closest and will therefore be identified
correctly. In Table 7.1(a) the Hamming distance for each pair of code words is



  1. Hence, the minimum distance dis also 2, and we can correct no more than
    0 errors! However, in the code of Table 7.1(b) the minimum distance is 4 (in
    fact, the distance is 4 for all pairs). That means it is guaranteed to correct 1-bit
    errors.


Table 7.1 Transforming a multiclass problem into a two-class one:
(a) standard method and (b) error-correcting code.

Class Class vector Class Class vector


a 1 0 0 0 a 1 1 1 1 1 1 1
b 0 1 0 0 b 0 0 0 0 1 1 1
c 0 0 1 0 c 0 0 1 1 0 0 1
d 0 0 0 1 d 0 1 0 1 0 1 0
(a) (b)

Free download pdf