Calculating information
Now it is time to explain how to calculate the information measure that is used
as a basis for evaluating different splits. We describe the basic idea in this section,
then in the next we examine a correction that is usually made to counter a bias
toward selecting splits on attributes with large numbers of possible values.
Before examining the detailed formula for calculating the amount of infor-
mation required to specify the class of an example given that it reaches a tree
node with a certain number ofyes’s and no’s, consider first the kind of proper-
ties we would expect this quantity to have:100 CHAPTER 4| ALGORITHMS: THE BASIC METHODS
... ...no
no yessunnyhot mild cooloutlooktemperatureyes
no
(a)... ...no
no
noyes
yessunnyhigh normaloutlookhumidity(b)... ...yes
yes
no
noyes
nosunnyfalse trueoutlookwindy(c)
Figure 4.3Expanded tree stumps for the weather data.