Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
The expected error for test data at a node is calculated as described previ-
ously, using the linear model for prediction. Because of the compensation factor
(n+n)/(n-n), it may be that the linear model can be further simplified by
dropping terms to minimize the estimated error. Dropping a term decreases the
multiplication factor, which may be enough to offset the inevitable increase in
average error over the training instances. Terms are dropped one by one, greed-
ily, as long as the error estimate decreases.
Finally, once a linear model is in place for each interior node, the tree is
pruned back from the leaves as long as the expected estimated error decreases.
The expected error for the linear model at that node is compared with the
expected error from the subtree below. To calculate the latter, the error from
each branch is combined into a single, overall value for the node by weighting
the branch by the proportion of the training instances that go down it and com-
bining the error estimates linearly using those weights.

Nominal attributes


Before constructing a model tree, all nominal attributes are transformed into
binary variables that are then treated as numeric. For each nominal attribute,
the average class value corresponding to each possible value in the enumeration
is calculated from the training instances, and the values in the enumeration are
sorted according to these averages. Then, if the nominal attribute has kpossi-
ble values, it is replaced by k-1 synthetic binary attributes, the ith being 0 if
the value is one of the first iin the ordering and 1 otherwise. Thus all splits are
binary: they involve either a numeric attribute or a synthetic binary one, treated
as a numeric attribute.
It is possible to prove analytically that the best split at a node for a nominal
variable with kvalues is one of the k -1 positions obtained by ordering
the average class values for each value of the attribute. This sorting operation
should really be repeated at each node; however, there is an inevitable increase
in noise because of small numbers of instances at lower nodes in the tree (and
in some cases nodes may not represent all values for some attributes), and
not much is lost by performing the sorting just once, before starting to build a
model tree.

Missing values


To take account of missing values, a modification is made to the SDR formula.
The final formula, including the missing value compensation, is

SDR=¥È ( )-¥( )
ÎÍ

̆
Œ{} ̊ ̇

Â


m
T

sd T

T
T

j sd T
jLR

j
,

,

246 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES

Free download pdf