Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
can be updated immediately. If not, look inside the node by proceeding recur-
sively down the tree.
Figure 4.16 shows the same instances and ball tree as Figure 4.14, but with
two cluster centers marked as black stars. Because all instances are assigned to
the closest center, the space is divided in two by the thick line shown in Figure
4.16(a). Begin at the root of the tree in Figure 4.16(b), with initial values for the
vector sum and counts for each cluster; all initial values are zero. Proceed recur-
sively down the tree. When node A is reached, all points within it lie in cluster
1, so cluster 1’s sum and count can be updated with the sum and count for node
A, and we need descend no further. Recursing back to node B, its ball straddles
the boundary between the clusters, so its points must be examined individually.
When node C is reached, it falls entirely within cluster 2; again, we can update
cluster 2 immediately and need descend no further. The tree is only examined
down to the frontier marked by the dashed line in Figure 4.16(b), and the advan-
tage is that the nodes below need not be opened—at least, not on this particu-
lar iteration ofk-means. Next time, the cluster centers will have changed and
things may be different.

Discussion


Many variants of the basic k-means procedure have been developed. Some
produce a hierarchical clustering by applying the algorithm with k=2 to the
overall dataset and then repeating, recursively, within each cluster.
How do you choose k?Often nothing is known about the likely number of
clusters, and the whole point of clustering is to find out. One way is to try dif-
ferent values and choose the best. To do this you need to learn how to evaluate
the success of machine learning, which is what Chapter 5 is about. We return
to clustering in Section 6.6.

4.9 Further reading


The 1R scheme was proposed and thoroughly investigated by Holte (1993). It
was never really intended as a machine learning “method”: the point was more
to demonstrate that very simple structures underlie most of the practical
datasets being used to evaluate machine learning methods at the time and that
putting high-powered inductive inference methods to work on simple datasets
was like using a sledgehammer to crack a nut. Why grapple with a complex deci-
sion tree when a simple rule will do? The method that generates one simple rule
per class is the result of work by Lucio de Souza Coelho of Brazil and Len Trigg
of New Zealand, and it has been dubbed hyperpipes.A very simple algorithm,
it has the advantage of being extremely fast and is quite feasible even with an
enormous number of attributes.

4.9 FURTHER READING 139

Free download pdf