For example, Figure 6.22(b) contains no expansion for windy =falsefrom the
root node because with eight instances it is the most populous expansion:
the value falseoccurs more often in the data than the value true. Similarly, from
the node labeled humidity =normalthere is no expansion for windy =false
because falseis the most common value for windyamong all instances with
humidity =normal. In fact, in our example the second restriction—namely, that
expansions with zero counts are omitted—never kicks in because the first
restriction precludes any path that starts with the tests humidity =normaland
windy =false,which is the only way to reach the solitary zero in Figure 6.22(a).
Each node of the tree represents the occurrence of a particular combina-
tion of attribute values. It is straightforward to retrieve the count for a
combination that occurs in the tree. However, the tree does not explicitly repre-
sent many nonzero counts because the most populous expansion for each
attribute is omitted. For example, the combination humidity = highand
play =yesoccurs three times in the data but has no node in the tree. Neverthe-
less, it turns out that any count can be calculated from those that the tree stores
explicitly.
Here’s a simple example. Figure 6.22(b) contains no node for humidity =
normal, windy =true,and play =yes. However, it shows three instances with
humidity =normaland windy =true,and one of them has a value for playthat
is different from yes.It follows that there must be two instances for play =yes.
Now for a trickier case: how many times does humidity =high, windy =true,
and play =nooccur? At first glance it seems impossible to tell because there is
no branch for humidity =high. However, we can deduce the number by calcu-
lating the count for windy =trueand play =no(3) and subtracting the count
for humidity =normal, windy =true,and play =no(1). This gives 2, the correct
value.
This idea works for any subset of attributes and any combination of attrib-
ute values, but it may have to be applied recursively. For example, to obtain the
count for humidity =high, windy =false,and play =no,we need the count for
windy =false and play =noand the count for humidity =normal, windy =false,
and play =no. We obtain the former by subtracting the count for windy =true
and play =no(3) from the count for play =no(5), giving 2, and the latter by
subtracting the count for humidity =normal, windy =true,and play =no(1)
from the count for humidity =normaland play =no(1), giving 0. Thus there
must be 2 - 0 =2 instances with humidity =high, windy =false,and play =no,
which is correct.
AD trees only pay off if the data contains many thousands of instances. It is
pretty obvious that they do not help on the weather data. The fact that they
yield no benefit on small datasets means that, in practice, it makes little sense
to expand the tree all the way down to the leaf nodes. Usually, a cutoff param-
eter kis employed, and nodes covering fewer than kinstances hold a list of point-
282 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES