We have identified one property of a good error-correcting code: the code
words must be well separated in terms of their Hamming distance. Because they
comprise the rows of the code table, this property is called row separation.There
is a second requirement that a good error-correcting code should fulfill:column
separation.The Hamming distance between every pair of columns must be
large, as must the distance between each column and the complement of every
other column. In Table 7.1(b), the seven columns are separated from one
another (and their complements) by at least 1 bit.
Column separation is necessary because if two columns are identical
(or if one is the complement of another), the corresponding classifiers will
make the same errors. Error correction is weakened if the errors are
correlated—in other words, if many bit positions are simultaneously incorrect.
The greater the distance between columns, the more errors are likely to be
corrected.
With fewer than four classes it is impossible to construct an effective
error-correcting code because good row separation and good column separa-
tion cannot be achieved simultaneously. For example, with three classes
there are only eight possible columns (2^3 ), four of which are complements
of the other four. Moreover, columns with all zeroes or all ones provide
no discrimination. This leaves just three possible columns, and the resulting
code is not error correcting at all. (In fact, it is the standard “one-per-class”
encoding.)
If there are few classes, an exhaustive error-correcting code such as the one
in Table 7.1(b) can be built. In an exhaustive code for kclasses, the columns
comprise every possible k-bit string, except for complements and the trivial all-
zero or all-one strings. Each code word contains 2k-^1 - 1 bits. The code is con-
structed as follows: the code word for the first class consists of all ones; that for
the second class has 2k-^2 zeroes followed by 2k-^2 - 1 ones; the third has 2k-^3 zeroes
followed by 2k-^3 ones followed by 2k-^3 zeroes followed by 2k-^3 - 1 ones; and so
on. The ith code word consists of alternating runs of 2k-izeroes and ones, the
last run being one short.
With more classes, exhaustive codes are infeasible because the number of
columns increases exponentially and too many classifiers have to be built. In
that case more sophisticated methods are employed, which can build a code with
good error-correcting properties from a smaller number of columns.
Error-correcting output codes do not work for local learning algorithms such
as instance-based learners, which predict the class of an instance by looking at
nearby training instances. In the case of a nearest-neighbor classifier, all output
bits would be predicted using the same training instance. The problem can be
circumvented by using different attribute subsets to predict each output bit,
decorrelating the predictions.
336 CHAPTER 7| TRANSFORMATIONS: ENGINEERING THE INPUT AND OUTPUT