6.7 BAYESIAN NETWORKS 283
ers to these instances rather than a list of pointers to other nodes. This makes
the trees smaller and more efficient to use.
Discussion
The K2 algorithm for learning Bayesian networks was introduced by Cooper
and Herskovits (1992). Bayesian scoring metrics are covered by Heckerman et
al. (1995). The TAN algorithm was introduced by Friedman et al. (1997), who
also describes multinets. Grossman and Domingos (2004) show how to use the
conditional likelihood for scoring networks. Guo and Greiner (2004) present an
extensive comparison of scoring metrics for Bayesian network classifiers. Bouck-
aert (1995) describes averaging over subnetworks. AD trees were introduced and
analyzed by Moore and Lee (1998)—the same Andrew Moore whose work on
kD-trees and ball trees was mentioned in Section 4.9. In a more recent paper,
Komarek and Moore (2000) introduce AD trees for incremental learning that
are also more efficient for datasets with many attributes.
We have only skimmed the surface of the subject of learning Bayesian net-
works. We left open questions of missing values, numeric attributes, and hidden
attributes. We did not describe how to use Bayesian networks for regression
tasks. Bayesian networks are a special case of a wider class of statistical models
called graphical models, which include networks with undirected edges
(called Markov networks). Graphical models are attracting great attention in the
machine learning community today.