7.2 DISCRETIZING NUMERIC ATTRIBUTES 303
discretization: it cannot produce adjacent intervals with the same label (such as
the first two of Figure 7.3). The reason is that merging two such intervals will
not affect the error count but it will free up an interval that can be used else-
where to reduce the error count.
Why would anyone want to generate adjacent intervals with the same label?
The reason is best illustrated with an example. Figure 7.4 shows the instance
space for a simple two-class problem with two numeric attributes ranging from
0 to 1. Instances belong to one class (the dots) if their first attribute (a1) is less
than 0.3 or if it is less than 0.7 andtheir second attribute (a2) is less than 0.5.
Otherwise, they belong to the other class (triangles). The data in Figure 7.4 has
been artificially generated according to this rule.
Now suppose we are trying to discretize both attributes with a view to learn-
ing the classes from the discretized attributes. The very best discretization splits
a1 into three intervals (0 through 0.3, 0.3 through 0.7, and 0.7 through 1.0) and
a2 into two intervals (0 through 0.5 and 0.5 through 1.0). Given these nominal
0 0.2 0.4 0.6 0.8 1
a 1
0
0.2
0.4
0.6
0.8
1
a 2
Figure 7.4Class distribution for a two-class, two-attribute problem.