Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
regression to subsamples of the data and outputs the solution that has the small-
est median-squared error.
SMOimplements the sequential minimal optimization algorithm for train-
ing a support vector classifier (Section 6.3), using polynomial or Gaussian
kernels (Platt 1998, Keerthi et al. 2001). Missing values are replaced globally,
nominal attributes are transformed into binary ones, and attributes are nor-
malized by default—note that the coefficients in the output are based on the
normalized data. Normalization can be turned off, or the input can be stan-
dardized to zero mean and unit variance. Pairwise classification is used for
multiclass problems. Logistic regression models can be fitted to the support
vector machine output to obtain probability estimates. In the multiclass case the
predicted probabilities will be coupled pairwise (Hastie and Tibshirani 1998).
When working with sparse instances, turn normalization off for faster opera-
tion.SMOregimplements the sequential minimal optimization algorithm for
regression problems (Smola and Schölkopf 1998).
Vo t e d Pe r c e p t r o n is the voted perceptron algorithm (Section 6.3, pages
222–223).Winnow(Section 4.6, pages 126–128) modifies the basic perceptron
to use multiplicative updates. The implementation allows for a second multi-
plier,b—different from 1/a—to be used in place of the divisions in Figure 4.11,
and also provides the balanced version of the algorithm.
PaceRegressionbuilds linear regression models using the new technique of
Pace regression (Wang and Witten 2002). When there are many attributes, Pace
regression is particularly good at determining which ones to discard—indeed,
under certain regularity conditions it is provably optimal as the number of
attributes tends to infinity.
SimpleLogisticbuilds logistic regression models (Section 4.6, pages 121–124),
fitting them using LogitBoost with simple regression functions as base learners
and determining how many iterations to perform using cross-validation—
which supports automatic attribute selection (Landwehr et al. 2003).Logisticis
an alternative implementation for building and using a multinomial logistic
regression model with a ridge estimator to gaurd against overfitting by penal-
izing large coefficients, based on work by le Cessie and van Houwelingen (1992).
RBFNetworkimplements a Gaussian radial basis function network (Section
6.3, page 234), deriving the centers and widths of hidden units using k-means
and combining the outputs obtained from the hidden layer using logistic regres-
sion if the class is nominal and linear regression if it is numeric. The activations
of the basis functions are normalized to sum to one before they are fed into the
linear models. You can specify k,the number of clusters; the maximum number
of logistic regression iterations for nominal-class problems; the minimum stan-
dard deviation for the clusters; and the ridge value for regression. If the class is
nominal,k-means is applied separately to each class to derive kclusters for each
class.

410 CHAPTER 10 | THE EXPLORER

Free download pdf