Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

4.3 DIVIDE-AND-CONQUER: CONSTRUCTING DECISION TREES 101



  1. When the number of either yes’s or no’s is zero, the information is
    zero.

  2. When the number ofyes’s and no’s is equal, the information reaches a
    maximum.


Moreover, the measure should be applicable to multiclass situations, not just to
two-class ones.
The information measure relates to the amount of information obtained by
making a decision, and a more subtle property of information can be derived
by considering the nature of decisions. Decisions can be made in a single stage,
or they can be made in several stages, and the amount of information involved
is the same in both cases. For example, the decision involved in


can be made in two stages. First decide whether it’s the first case or one of the
other two cases:


and then decide which of the other two cases it is:


In some cases the second decision will not need to be made, namely, when
the decision turns out to be the first one. Taking this into account leads to the
equation


info 2, 3, 4([])=info 2, 7([])+( 79 )¥info 3, 4([]).

info 3, 4([])

info 2, 7([])

info 2, 3, 4([])

false true

yes

yes

no

sunny overcast rainy

outlook

humidity windy

high normal

yes no

Figure 4.4Decision tree for the weather data.

Free download pdf