Calculating information
Now it is time to explain how to calculate the information measure that is used
as a basis for evaluating different splits. We describe the basic idea in this section,
then in the next we examine a correction that is usually made to counter a bias
toward selecting splits on attributes with large numbers of possible values.
Before examining the detailed formula for calculating the amount of infor-
mation required to specify the class of an example given that it reaches a tree
node with a certain number ofyes’s and no’s, consider first the kind of proper-
ties we would expect this quantity to have:
100 CHAPTER 4| ALGORITHMS: THE BASIC METHODS
... ...
no
no yes
sunny
hot mild cool
outlook
temperature
yes
no
(a)
... ...
no
no
no
yes
yes
sunny
high normal
outlook
humidity
(b)
... ...
yes
yes
no
no
yes
no
sunny
false true
outlook
windy
(c)
Figure 4.3Expanded tree stumps for the weather data.