described overfitting-avoidance bias in Chapter 1 (page 35), and we will
encounter this problem repeatedly in subsequent chapters.
For 1R, overfitting is likely to occur whenever an attribute has a large
number of possible values. Consequently, when discretizing a numeric attrib-
ute a rule is adopted that dictates a minimum number of examples of the
majority class in each partition. Suppose that minimum is set at three. This
eliminates all but two of the preceding partitions. Instead, the partitioning
process begins
yes no yes yes |yes...
ensuring that there are three occurrences ofyes,the majority class, in the first
partition. However, because the next example is also yes,we lose nothing by
including that in the first partition, too. This leads to a new division:
yes no yes yes yes |no no yes yes yes |no yes yes no
where each partition contains at least three instances of the majority class, except
the last one, which will usually have less. Partition boundaries always fall
between examples of different classes.
Whenever adjacent partitions have the same majority class, as do the first two
partitions above, they can be merged together without affecting the meaning of
the rule sets. Thus the final discretization is
yes no yes yes yes no no yes yes yes |no yes yes no
which leads to the rule set
temperature: £77.5 Æyes
>77.5 Æno
The second rule involved an arbitrary choice; as it happens,nowas chosen. If
we had chosen yesinstead, there would be no need for any breakpoint at all—
and as this example illustrates, it might be better to use the adjacent categories
to help to break ties. In fact this rule generates five errors on the training set
and so is less effective than the preceding rule for outlook. However, the same
procedure leads to this rule for humidity:
humidity: £82.5 Æyes
>82.5 and £ 95.5 Æno
>95.5 Æyes
This generates only three errors on the training set and is the best “1-rule” for
the data in Table 1.3.
Finally, if a numeric attribute has missing values, an additional category is
created for them, and the preceding discretization procedure is applied just to
the instances for which the attribute’s value is defined.
4.1 INFERRING RUDIMENTARY RULES 87