Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

We have identified one property of a good error-correcting code: the code words must be well separated in terms of their Hamming distance. Because they comprise the rows of the code table, this property is called row separation.There is a second requirement that a good error-correcting code should fulfill:column separation.The Hamming distance between every pair of columns must be large, as must the distance between each column and the complement of every other column. In Table 7.1(b), the seven columns are separated from one another (and their complements) by at least 1 bit. Column separation is necessary because if two columns are identical (or if one is the complement of another), the corresponding classifiers will make the same errors. Error correction is weakened if the errors are correlated—in other words, if many bit positions are simultaneously incorrect. The greater the distance between columns, the more errors are likely to be corrected. With fewer than four classes it is impossible to construct an effective error-correcting code because good row separation and good column separation cannot be achieved simultaneously. For example, with three classes there are only eight possible columns (2^3 ), four of which are complements of the other four. Moreover, columns with all zeroes or all ones provide no discrimination. This leaves just three possible columns, and the resulting code is not error correcting at all. (In fact, it is the standard “one-per-class” encoding.) If there are few classes, an exhaustive error-correcting code such as the one in Table 7.1(b) can be built. In an exhaustive code for kclasses, the columns comprise every possible k-bit string, except for complements and the trivial all- zero or all-one strings. Each code word contains 2k-^1 - 1 bits. The code is con- structed as follows: the code word for the first class consists of all ones; that for the second class has 2k-^2 zeroes followed by 2k-^2 - 1 ones; the third has 2k-^3 zeroes followed by 2k-^3 ones followed by 2k-^3 zeroes followed by 2k-^3 - 1 ones; and so on. The ith code word consists of alternating runs of 2k-izeroes and ones, the last run being one short. With more classes, exhaustive codes are infeasible because the number of columns increases exponentially and too many classifiers have to be built. In that case more sophisticated methods are employed, which can build a code with good error-correcting properties from a smaller number of columns. Error-correcting output codes do not work for local learning algorithms such as instance-based learners, which predict the class of an instance by looking at nearby training instances. In the case of a nearest-neighbor classifier, all output bits would be predicted using the same training instance. The problem can be circumvented by using different attribute subsets to predict each output bit, decorrelating the predictions.

336 CHAPTER 7| TRANSFORMATIONS: ENGINEERING THE INPUT AND OUTPUT

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Get our desktop app

Company

Features

Documentation

Resources