Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

LBR(for Lazy Bayesian Rules) is a Bayesian classifier that defers all process- ing to classification time. For each test instance it selects a set of attributes for which the independence assumption should not be made; the others are treated as independent of each other given the class and the selected set of attributes. It works well for small test sets (Zheng and Webb 2000). LW Lis a general algorithm for locally weighted learning. It assigns weights using an instance-based method and builds a classifier from the weighted instances. The classifier is selected in LW L’s object editor: a good choice is Naïve Bayes for classification problems and linear regression for regression problems (Section 6.5, pages 251–253). You can set the number of neighbors used, which determines the kernel bandwidth, and the kernel shape to use for weighting— linear, inverse, or Gaussian. Attribute normalization is turned on by default.

Miscellaneous classifiers

The misc. category includes two simple classifiers that were mentioned at the end of Section 4.7 (page 136).Hyperpipes,for discrete classification problems, records the range of values observed in the training data for each attribute and category and works out which ranges contain the attribute values of a test instance, choosing the category with the largest number of correct ranges.VFI (voting feature intervals)constructs intervals around each class by discretizing numeric attributes and using point intervals for nominal ones, records class counts for each interval on each attribute, and classifies test instances by voting (Demiroz and Guvenir 1997). A simple attribute weighting scheme assigns higher weight to more confident intervals, where confidence is a function of entropy.VFIis faster than Naïve Bayes but slower than hyperpipes.Neither method can handle missing values.

10.5 Metalearning algorithms

Metalearning algorithms, listed in Table 10.6, take classifiers and turn them into more powerful learners. One parameter specifies the base classifier; others specify the number of iterations for schemes such as bagging and boosting and an initial seed for the random number generator. We already met FilteredClas- sifierin Section 10.3: it runs a classifier on data that has been passed through a filter, which is a parameter. The filter’s own parameters are based exclusively on the training data, which is the appropriate way to apply a supervised filter to test data.

Bagging and randomization

Baggingbags a classifier to reduce variance (Section 7.5, page 316). This imple- mentation works for both classification and regression, depending on the base

414 CHAPTER 10 | THE EXPLORER

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Miscellaneous classifiers

10.5 Metalearning algorithms

Bagging and randomization

Get our desktop app

Company

Features

Documentation

Resources