Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
often applied to a training dataset and then also applied to the test file. If the
filter is supervised—for example, if it uses class values to derive good intervals
for discretization—applying it to the test data will bias the results. It is the dis-
cretization intervals derived from the trainingdata that must be applied to the
test data. When using supervised filters you must be careful to ensure that
the results are evaluated fairly, an issue that does not arise with unsupervised
filters.
We treat Weka’s unsupervised and supervised filtering methods separately.
Within each type there is a further distinction between attribute filters,which
work on the attributes in the datasets, and instance filters,which work on the
instances. To learn more about a particular filter, select it in the Weka Explorer

394 CHAPTER 10 | THE EXPLORER


(a)
Figure 10.16Visualizing the Iris dataset.
Free download pdf