Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

6.7 BAYESIAN NETWORKS 275


For example, consider an instance with values outlook =rainy, temperature =
cool, humidity =high,and windy =true. To calculate the probability for play =
no,observe that the network in Figure 6.21 gives probability 0.367 from node
play,0.385 from outlook,0.429 from temperature,0.250 from humidity,and
0.167 from windy. The product is 0.0025. The same calculation for play =yes
yields 0.0077. However, these are clearly not the final answer: the final proba-
bilities must sum to 1, whereas 0.0025 and 0.0077 don’t. They are actually the
joint probabilities Pr[play =no,E] and Pr[play =yes,E], where Edenotes all the
evidence given by the instance’s attribute values. Joint probabilities measure the
likelihood of observing an instance that exhibits the attribute values in Eas well
as the respective class value. They only sum to 1 if they exhaust the space of all
possible attribute–value combinations, including the class attribute. This is cer-
tainly not the case in our example.
The solution is quite simple (we already encountered it in Section 4.2).
To obtain the conditional probabilities Pr [play =no|E] and Pr [play =yes|E],
normalize the joint probabilities by dividing them by their sum. This gives
probability 0.245 for play =noand 0.755 forplay =yes.
Just one mystery remains: why multiply all those probabilities together? It
turns out that the validity of the multiplication step hinges on a single assump-
tion—namely that, given values for each of a node’s parents, knowing the values
for any other ancestors does not change the probability associated with each of
its possible values. In other words, ancestors do not provide any information
about the likelihood of the node’s values over and above the information pro-
vided by the parents. This can be written


which must hold for all values of the nodes and attributes involved. In statistics
this property is called conditional independence. Multiplication is valid pro-
vided that each node is conditionally independent of its grandparents, great-
grandparents, and so on, given its parents. The multiplication step results
directly from the chain rule in probability theory, which states that the joint
probability ofnattributes aican be decomposed into this product:


The decomposition holds for any order of the attributes. Because our Bayesian
network is an acyclic graph, its nodes can be ordered to give all ancestors of a
node aiindices smaller than i. Then, because of the conditional independence
assumption,


Praa aniiPraa a Praas parents
i

n
ii
i

n
12 1 1
11

[], ,..., = []-,..., = []’ ,
==

’’


Praa aniiPraa a
i

n
12 1 1
1

[], ,..., = []-,...,
=


Pr node ancestors[]=Pr node parents[],
Free download pdf