Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

10.2 EXPLORING THE EXPLORER 389


position, and various simple textural features. The training data file is supplied
with the Weka distribution and called segment-challenge.arff. Having loaded it
in, select the User Classifier. For evaluation use the special test set called segment-
test.arffas the Supplied test set on the Classifypanel. Evaluation by cross-
validation is impossible when you have to construct a classifier manually for
each fold.
Following Start,a new window appears and Weka waits for you to build the
classifier. The Tree Visualizerand Data Visualizertabs switch between different
views. The former shows the current state of the classification tree, and each node
gives the number of instances of each class at that node. The aim is to come up
with a tree in which the leaf nodes are as pure as possible. Initially there is only
one node, the root, which contains all the data. Switch to the Data Visualizerto
create a split. This shows the same two-dimensional plot that we saw in Figure
10.6(b) for the Iris dataset and Figure 10.12 for the CPU performance data. The
attributes to use for X and Y are selected as before, and the goal here is to find a
combination that separates the classes as cleanly as possible. Figure 10.13(a)
shows a good choice:region–centroid–rowfor X and intensity–meanfor Y.
Having found a good separation, you must specify a region in the graph. Four
tools for this appear in the pull-down menu below the Y-axis selector.Select
Instanceidentifies a particular instance.Rectangle(shown in Figure 10.13(a))
allows you to drag out a rectangle on the graph. With Polygonand Polylineyou
build a free-form polygon or draw a free-form polyline (left-click to add a vertex
and right-click to complete the operation). Once an area has been selected, it
turns gray. In Figure 10.13(a) the user has defined a rectangle. The Submit
button creates two new nodes in the tree, one holding the selected instances and
the other with all the rest.Clearclears the selection;Savesaves the instances in
the current tree node as an ARFF file.
At this point, the Tree Visualizershows the tree in Figure 10.13(b). There is
a pure node for the skyclass, but the other node is mixed and should be split
further. Clicking on different nodes determines which subset of data is shown
by the Data Visualizer. Continue adding nodes until you are satisfied with the
result—that is, until the leaf nodes are mostly pure. Then right-click on any
blank space in the Tree Visualizerand choose Accept the Tree. Weka evaluates
your tree on the test set and outputs performance statistics (80% is a good score
on this problem).
Building trees manually is very tedious. But Weka can complete the task for
you by building a subtree under any node: just right-click the node.


Using a metalearner

Metalearners (Section 7.5) take simple classifiers and turn them into more pow-
erful learners. For example, to boost decision stumps in the Explorer, go to the

Free download pdf