Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

10.3 FILTERING ALGORITHMS 397


or first-3,5,9-lastfor attributes 1,2,3,5,9,10,11,12,....The selection can be
inverted, affecting all attributes exceptthose specified. These features are shared
by many filters.
Removehas already been described. Similar filters are RemoveType,which
deletes all attributes of a given type (nominal, numeric, string, or date), and
RemoveUseless,which deletes constant attributes and nominal attributes whose
values are different for almost all instances. You can decide how much variation
is tolerated before an attribute is deleted by specifying the number of distinct
values as a percentage of the total number of values. Some unsupervised attrib-
ute filters behave differently if the menu in the Preprocesspanel has been used
to set a class attribute. For example,RemoveTypeand RemoveUselessboth skip
the class attribute.
AddClusterapplies a clustering algorithm to the data before filtering it. You
use the object editor to choose the clustering algorithm. Clusterers are config-
ured just as filters are (Section 10.6). The AddClusterobject editor contains its
own Choosebutton for the clusterer, and you configure the clusterer by clicking
its line and getting anotherobject editor panel, which must be filled in before
returning to the AddClusterobject editor. This is probably easier to understand
when you do it in practice than when you read about it in a book! At any rate,
once you have chosen a clusterer,AddClusteruses it to assign a cluster number
to each instance, as a new attribute. The object editor also allows you to ignore
certain attributes when clustering, specified as described previously for Copy.
ClusterMembershipuses a clusterer, again specified in the filter’s object editor,
to generate membership values. A new version of each instance is created whose
attributes are these values. The class attribute, if set, is left unaltered.
AddExpressioncreates a new attribute by applying a mathematical function
to numeric attributes. The expression can contain attribute references and con-
stants; the arithmetic operators +,-, *, /, and Ÿ; the functions logand exp, abs
and sqrt, floor, ceiland rint,^5 and sin, cos,and tan;and parentheses. Attributes
are specified by the prefix a, for example,a7 is the seventh attribute. An example
expression is


There is a debug option that replaces the new attribute’s value with a postfix
parse of the supplied expression.
Whereas AddExpressionapplies mathematical functions,NumericTransform
performs an arbitrary transformation by applying a given Java function to
selected numeric attributes. The function can be anything that takes a doubleas
its argument and returns another double,for example,sqrt()in java.lang.Math.


aa a12 5Ÿ **log(7 40.)

(^5) The rintfunction rounds to the closest integer.

Free download pdf