Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

an incremental way of restructuring the tree to compensate for incorrect choices caused by infelicitous ordering of examples. The final hierarchy for all 14 examples is shown at the end of Figure 6.17. There are two major clusters, each of which subdivides further into its own subclusters. If the play/don’t playdistinction really represented an inherent feature of the data, a single cluster would be expected for each outcome. No such clean structure is observed, although a (very) generous eye might discern a slight tendency at lower levels for yesinstances to group together, and likewise for noinstances. Careful analysis of the clustering reveals some anomalies. (Table 4.6 will help if you want to follow this analysis in detail.) For example, instances aand bare actually very similar to each other, yet they end up in completely different parts of the tree. Instance bends up with k,which is a worse match than a.Instance aends up with dand h,and it is certainly not as similar to das it is to b. The reason why aand bbecome separated is that aand dget merged, as described previously, because they form the best and second-best hosts for h. It was unlucky that aand bwere the first two examples: if either had occurred later, it may well have ended up with the other. Subsequent split- ting and remerging may be able to rectify this anomaly, but in this case they didn’t. Exactly the same scheme works for numeric attributes. Category utility is defined for these as well, based on an estimate of the mean and standard deviation of the value of that attribute. Details are deferred to the next subsection. However, there is just one problem that we must attend to here: when estimat- ing the standard deviation of an attribute for a particular node, the result will be zero if the node contains only one instance, as it does more often than not. Unfortunately, zero variances produce infinite values in the category utility formula. A simple heuristic solution is to impose a minimum variance on each attribute. It can be argued that because no measurement is completely precise, it is reasonable to impose such a minimum: it represents the measurement error in a single sample. This parameter is called acuity. Figure 6.18(a) shows, at the top, a hierarchical clustering produced by the incremental algorithm for part of the Iris dataset (30 instances, 10 from each class). At the top level there are two clusters (i.e., subclusters of the single node representing the whole dataset). The first contains both Iris virginicasand Iris versicolors,and the second contains only Iris setosas.The Iris setosasthemselves split into two subclusters, one with four cultivars and the other with six. The other top-level cluster splits into three subclusters, each with a fairly complex structure. Both the first and second contain only Iris versicolors,with one excep- tion, a stray Iris virginica,in each case; the third contains only Iris virginicas. This represents a fairly satisfactory clustering of the Iris data: it shows that the three genera are not artificial at all but reflect genuine differences in the data. This is, however, a slightly overoptimistic conclusion, because quite a bit of

258 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Get our desktop app

Company

Features

Documentation

Resources