Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

6.5 NUMERIC PREDICTION 243


eralization is. Salzberg (1991) suggested that generalization with nested exem-
plars can achieve a high degree of classification of accuracy on a variety of dif-
ferent problems, a conclusion disputed by Wettschereck and Dietterich (1995),
who argued that these results were fortuitous and did not hold in other
domains. Martin (1995) explored the idea that it is not the generalization but
the overgeneralization that occurs when hyperrectangles nest or overlap that is
responsible for poor performance and demonstrated that if nesting and over-
lapping are avoided excellent results are achieved in a large number of domains.
The generalized distance function based on transformations is described by
Cleary and Trigg (1995).
Exemplar generalization is a rare example of a learning strategy in which the
search proceeds from specific to general rather than from general to specific as
in the case of tree or rule induction. There is no particular reason why specific-
to-general searching should necessarily be handicapped by forcing the examples
to be considered in a strictly incremental fashion, and batch-oriented
approaches exist that generate rules using a basic instance-based approach.
Moreover, it seems that the idea of producing conservative generalizations and
coping with instances that are not covered by choosing the “closest” generaliza-
tion is an excellent one that will eventually be extended to ordinary tree and
rule inducers.

6.5 Numeric prediction


Trees that are used for numeric prediction are just like ordinary decision
trees except that at each leaf they store either a class value that represents the
average value of instances that reach the leaf, in which case the tree is called a
regression tree,or a linear regression model that predicts the class value of
instances that reach the leaf, in which case it is called a model tree. In what
follows we will describe model trees because regression trees are really a special
case.
Regression and model trees are constructed by first using a decision tree
induction algorithm to build an initial tree. However, whereas most decision
tree algorithms choose the splitting attribute to maximize the information gain,
it is appropriate for numeric prediction to instead minimize the intrasubset
variation in the class values down each branch. Once the basic tree has been
formed, consideration is given to pruning the tree back from each leaf, just as
with ordinary decision trees. The only difference between regression tree and
model tree induction is that for the latter, each node is replaced by a regression
plane instead of a constant value. The attributes that serve to define that regres-
sion are precisely those that participate in decisions in the subtree that will be
pruned, that is, in nodes underneath the current one.
Free download pdf