Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

can be updated immediately. If not, look inside the node by proceeding recursively down the tree. Figure 4.16 shows the same instances and ball tree as Figure 4.14, but with two cluster centers marked as black stars. Because all instances are assigned to the closest center, the space is divided in two by the thick line shown in Figure 4.16(a). Begin at the root of the tree in Figure 4.16(b), with initial values for the vector sum and counts for each cluster; all initial values are zero. Proceed recursively down the tree. When node A is reached, all points within it lie in cluster 1, so cluster 1’s sum and count can be updated with the sum and count for node A, and we need descend no further. Recursing back to node B, its ball straddles the boundary between the clusters, so its points must be examined individually. When node C is reached, it falls entirely within cluster 2; again, we can update cluster 2 immediately and need descend no further. The tree is only examined down to the frontier marked by the dashed line in Figure 4.16(b), and the advantage is that the nodes below need not be opened—at least, not on this particu- lar iteration ofk-means. Next time, the cluster centers will have changed and things may be different.

Discussion

Many variants of the basic k-means procedure have been developed. Some produce a hierarchical clustering by applying the algorithm with k=2 to the overall dataset and then repeating, recursively, within each cluster. How do you choose k?Often nothing is known about the likely number of clusters, and the whole point of clustering is to find out. One way is to try different values and choose the best. To do this you need to learn how to evaluate the success of machine learning, which is what Chapter 5 is about. We return to clustering in Section 6.6.

4.9 Further reading

The 1R scheme was proposed and thoroughly investigated by Holte (1993). It was never really intended as a machine learning “method”: the point was more to demonstrate that very simple structures underlie most of the practical datasets being used to evaluate machine learning methods at the time and that putting high-powered inductive inference methods to work on simple datasets was like using a sledgehammer to crack a nut. Why grapple with a complex deci- sion tree when a simple rule will do? The method that generates one simple rule per class is the result of work by Lucio de Souza Coelho of Brazil and Len Trigg of New Zealand, and it has been dubbed hyperpipes.A very simple algorithm, it has the advantage of being extremely fast and is quite feasible even with an enormous number of attributes.

4.9 FURTHER READING 139

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Discussion

4.9 Further reading

Get our desktop app

Company

Features

Documentation

Resources