Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
The procedure is best illustrated by an example. We will use the familiar
weather data again, but without the playattribute. To track progress the 14
instances are labeled a, b, c,...,n(as in Table 4.6), and for interest we include
the class yesor noin the label—although it should be emphasized that for this
artificial dataset there is little reason to suppose that the two classes of instance
should fall into separate categories. Figure 6.17 shows the situation at salient
points throughout the clustering procedure.
At the beginning, when new instances are absorbed into the structure, they
each form their own subcluster under the overall top-level cluster. Each new
instance is processed by tentatively placing it into each of the existing leaves and
evaluating the category utility of the resulting set of the top-level node’s chil-
dren to see whether the leaf is a good “host” for the new instance. For each of

256 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES


Figure 6.17Clustering the weather data.


a:no a:no b:no c:yes d:yes e:yes a:no b:no c:yes d:yes

e:yes f:no

a:no b:no c:yes d:yes

e:yes f:no g:yes

b:no c:yes

a:no d:yes h:no e:yes f:no g:yes

a:no d:yes h:no c:yes l:yes

b:no k:yes

g:yes f:no j:yes m:yes n:no

e:yes i:yes
Free download pdf