Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
One parameter is the name of the Java class that implements the function
(which must be a fully qualified name); another is the name of the transfor-
mation method itself.
Normalizescales all numeric values in the dataset to lie between 0 and 1.Stan-
dardizetransforms them to have zero mean and unit variance. Both skip the
class attribute, if set.

Changing values
SwapValuesswaps the positions of two values of a nominal attribute. The order
of values is entirely cosmetic—it does not affect learning at all—but if the class
is selected, changing the order affects the layout of the confusion matrix.
MergeTwoValuesmerges values of a nominal attribute into a single category. The
new value’s name is a concatenation of the two original ones, and every occur-
rence of either of the original values is replaced by the new one. The index of
the new value is the smaller of the original indices. For example, if you merge
the first two values of the outlookattribute in the weather data—in which there
are five sunny,four overcast,and five rainyinstances—the new outlookattribute
will have values sunny_overcastand rainy;there will be nine sunny_overcast
instances and the original five rainyones.
One way of dealing with missing values is to replace them globally before
applying a learning scheme.ReplaceMissingValuesreplaces each missing value
with the mean for numeric attributes and the mode for nominal ones. If a class
is set, missing values of that attribute are not replaced.

Conversions
Many filters convert attributes from one form to another.Discretizeuses equal-
width or equal-frequency binning (Section 7.2) to discretize a range of numeric
attributes, specified in the usual way. For the former method the number of bins
can be specified or chosen automatically by maximizing the likelihood using
leave-one-out cross-validation. PKIDiscretize discretizes numeric attributes
using equal-frequency binning in which the number of bins is the square root
of the number of values (excluding missing values). Both these filters skip the
class attribute.
MakeIndicatorconverts a nominal attribute into a binary indicator attribute
and can be used to transform a multiclass dataset into several two-class ones. It
substitutes a binary attribute for the chosen nominal one, whose value for each
instance is 1 if a particular original value was present and 0 otherwise. The new
attribute is declared to be numeric by default, but it can be made nominal if
desired.
Some learning schemes, such as support vector machines, only handle binary
attributes. The NominalToBinaryfilter transforms all multivalued nominal

398 CHAPTER 10 | THE EXPLORER

Free download pdf