Pattern Recognition and Machine Learning

(Jeff_L) #1
504 10. APPROXIMATE INFERENCE

With this factorization we can appeal to the general result (10.9) to find expressions
for the optimal factors. Consider first the distributionq(w). Discarding terms that
are independent ofw,wehave

lnq(w)=Eα[ln{h(w,ξ)p(w|α)p(α)}]+const
=lnh(w,ξ)+Eα[lnp(w|α)]+const.

We now substitute forlnh(w,ξ)using (10.153), and forlnp(w|α)using (10.165),
giving

lnq(w)=−

E[α]
2

wTw+

∑N

n=1

{
(tn− 1 /2)wTφn−λ(ξn)wTφnφTnw

}
+const.

We see that this is a quadratic function ofwand so the solution forq(w)will be
Gaussian. Completing the square in the usual way, we obtain

q(w)=N(w|μN,ΣN) (10.174)

where we have defined

Σ−N^1 μN =

∑N

n=1

(tn− 1 /2)φn (10.175)

Σ−N^1 = E[α]I+2

∑N

n=1

λ(ξn)φnφTn. (10.176)

Similarly, the optimal solution for the factorq(α)is obtained from

lnq(α)=Ew[lnp(w|α)] + lnp(α)+const.

Substituting forlnp(w|α)using (10.165), and forlnp(α)using (10.166), we obtain

lnq(α)=

M

2

lnα−

α
2

E

[
wTw

]
+(a 0 −1) lnα−b 0 α+const.

We recognize this as the log of a gamma distribution, and so we obtain

q(α)=Gam(α|aN,bN)=

1

Γ(a 0 )

ab 00 αa^0 −^1 e−b^0 α (10.177)

where

aN = a 0 +

M

2

(10.178)

bN = b 0 +

1

2

Ew

[
wTw

]

. (10.179)

Free download pdf