10.4 LEARNING ALGORITHMS 409
value (or distribution) of the uncovered training instances. The information gain
(nominal class) or variance reduction (numeric class) of each antecedent is com-
puted, and rules are pruned using reduced-error pruning.ZeroRis even simpler:
it predicts the test data’s majority class (if nominal) or average value (if numeric).
Prismimplements the elementary covering algorithm for rules (Section 4.4).
Partobtains rules from partial decision trees (Section 6.2, pages 207–210). It
builds the tree using C4.5’s heuristics with the same user-defined parameters as
J4.8.M5Rulesobtains regression rules from model trees built using M5¢(Section
6.5, pages 250–251).Ridorlearns rules with exceptions (Section 6.2, pages
210–213) by generating the default rule, using incremental reduced-error
pruning to find exceptions with the smallest error rate, finding the best excep-
tions for each exception, and iterating.
JRipimplements RIPPER (Section 6.2, pages 205–207), including heuristic
global optimization of the rule set (Cohen 1995).Nngeis a nearest-neighbor
method for generating rules using nonnested generalized exemplars (Section
6.4, pages 238–239).
Functions
The functions category of Table 10.5 includes an assorted group of classifiers
that can be written down as mathematical equations in a reasonably natural
way. Other methods, such as decision trees and rules, cannot (there are excep-
tions: Naïve Bayes has a simple mathematical formulation). Three of them
implement linear regression (Section 4.6).SimpleLinearRegressionlearns a linear
regression model based on a single attribute—it chooses the one that yields
the smallest squared error. Missing values and nonnumeric attributes are not
allowed.LinearRegressionperforms standard least-squares linear regression and
can optionally perform attribute selection, either by greedily using backward
elimination (Section 7.1) or by building a full model from all attributes and
dropping terms one by one in decreasing order of their standardized coefficients
until a stopping criteria is reached (this method was described in a slightly dif-
ferent context in Section 6.5 under Pruning the tree,page 245). Both methods
use a version of the AIC termination criterion of Section 6.7 (page 277). The
implementation has two further refinements: a mechanism for detecting
collinear attributes (which can be turned off ) and a ridgeparameter that stabi-
lizes degenerate cases and can reduce overfitting by penalizing large coefficients.
Technically,LinearRegressionimplements ridge regression, which is described in
standard statistics texts.
LeastMedSqis a robust linear regression method that minimizes the median
(rather than the mean) of the squares of divergences from the regression line
(Section 7.4) (Rousseeuw and Leroy 1987). It repeatedly applies standard linear