whereas successive splits on a numeric attribute may continue to yield new
information. Whereas a nominal attribute can only be tested once on any path
from the root of a tree to the leaf, a numeric one can be tested many times. This
can yield trees that are messy and difficult to understand because the tests on
any single numeric attribute are not located together but can be scattered along
the path. An alternative, which is harder to accomplish but produces a more
readable tree, is to allow a multiway test on a numeric attribute, testing against
several constants at a single node of the tree. A simpler but less powerful solu-
tion is to prediscretize the attribute as described in Section 7.2.
Missing values
The next enhancement to the decision-tree-building algorithm deals with the
problems of missing values. Missing values are endemic in real-world datasets.
As explained in Chapter 2 (page 58), one way of handling them is to treat them
as just another possible value of the attribute; this is appropriate if the fact that
the attribute is missing is significant in some way. In that case no further action
need be taken. But if there is no particular significance in the fact that a certain
instance has a missing attribute value, a more subtle solution is needed. It is
tempting to simply ignore all instances in which some of the values are missing,
but this solution is often too draconian to be viable. Instances with missing
values often provide a good deal of information. Sometimes the attributes
whose values are missing play no part in the decision, in which case these
instances are as good as any other.
One question is how to apply a given decision tree to an instance in which
some of the attributes to be tested have missing values. We outlined a solution
in Section 3.2 that involves notionally splitting the instance into pieces, using a
numeric weighting method, and sending part of it down each branch in pro-
portion to the number of training instances going down that branch. Eventu-
ally, the various parts of the instance will each reach a leaf node, and the
decisions at these leaf nodes must be recombined using the weights that have
percolated to the leaves. The information gain and gain ratio calculations
described in Section 4.3 can also be applied to partial instances. Instead of
having integer counts, the weights are used when computing both gain figures.
Another question is how to partition the training set once a splitting attrib-
ute has been chosen, to allow recursive application of the decision tree forma-
tion procedure on each of the daughter nodes. The same weighting procedure
is used. Instances for which the relevant attribute value is missing are notion-
ally split into pieces, one piece for each branch, in the same proportion as the
known instances go down the various branches. Pieces of the instance con-
tribute to decisions at lower nodes in the usual way through the information
gain calculation, except that they are weighted accordingly. They may be further
6.1 DECISION TREES 191