Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
We can calculate the average information value of these, taking into account the
number of instances that go down each branch—five down the first and third
and four down the second:

This average represents the amount of information that we expect would be nec-
essary to specify the class of a new instance, given the tree structure in Figure
4.2(a).

info([][][]2 3 4 0 3 2)=(5 14)¥+0 971 (4 14)¥+ 0 (5 14)¥=0 971 0 693,, ,, ,bits....

info bits
info bits
info bits

2 3 0 971
40 00
3 2 0 971

,.
,.
,.

([])=
([])=
([])=

98 CHAPTER 4| ALGORITHMS: THE BASIC METHODS


yes
yes
no
no
no

sunny

yes
yes
yes
yes

overcast

yes
yes
yes
no
no

rainy

outlook

(a)

yes
yes
yes

no
no

hot

yes
yes
no
no

yes
yes
yes
no

mild cool

temperature

yes

(b)

yes
yes
yes
no
no
no
no

yes
yes
yes
yes
yes
yes
no

high normal

humidity

(c)

yes
yes
yes
yes
yes
yes
no
no

yes
yes
yes
no
no
no

false true

windy

(d)
Figure 4.2Tree stumps for the weather data.
Free download pdf