Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Bayesian multinet. To obtain a prediction for a particular class value, take the corresponding network’s probability and multiply it by the class’s prior probability. Do this for each class and normalize the result as we did previously. In this case we would not use the conditional likelihood to learn the network for each class value. All the network learning algorithms introduced previously are score based. A different strategy, which we will not explain here, is to piece a network together by testing individual conditional independence assertions based on subsets of the attributes. This is known as structure learning by conditional independence tests.

Data structures for fast learning

Learning Bayesian networks involves a lot of counting. For each network structure considered in the search, the data must be scanned afresh to obtain the counts needed to fill out the conditional probability tables. Instead, could they be stored in a data structure that eliminated the need for scanning the data over and over again? An obvious way is to precompute the counts and store the nonzero ones in a table—say, the hash table mentioned in Section 4.5. Even so, any nontrivial dataset will have a huge number of nonzero counts. Again, consider the weather data from Table 1.2 (page 11). There are five attributes, two with three values and three with two values. This gives 4 ¥ 4 ¥ 3 ¥ 3 ¥ 3 =432 possible counts. Each component of the product corre- sponds to an attribute, and its contribution to the product is one more than the number of its values because the attribute may be missing from the count. All these counts can be calculated by treating them as item sets, as explained in Section 4.5, and setting the minimum coverage to one. But even without storing counts that are zero, this simple scheme runs into memory problems very quickly. It turns out that the counts can be stored effectively in a structure called an all-dimensions (AD) tree,which is analogous to the kD-trees used for nearest- neighbor search described in Section 4.7. For simplicity, we illustrate this using a reduced version of the weather data that only has the attributes humidity, windy,and play. Figure 6.22(a) summarizes the data. The number of possible counts is 3 ¥ 3 ¥ 3 =27, although only 8 of them are shown. For example, the count for play =nois 5 (count them!). Figure 6.22(b) shows an AD tree for this data. Each node says how many instances exhibit the attribute values that are tested along the path from the root to that node. For example, the leftmost leaf says that there is one instance with values humidity =normal, windy =true,and play =no,and the rightmost leaf says that there are five instances with play =no. It would be trivial to construct a tree that enumerates all 27 counts explic- itly. However, that would gain nothing over a plain table and is obviously not

280 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Data structures for fast learning

Get our desktop app

Company

Features

Documentation

Resources