504 10. APPROXIMATE INFERENCE
With this factorization we can appeal to the general result (10.9) to find expressions
for the optimal factors. Consider first the distributionq(w). Discarding terms that
are independent ofw,wehave
lnq(w)=Eα[ln{h(w,ξ)p(w|α)p(α)}]+const
=lnh(w,ξ)+Eα[lnp(w|α)]+const.
We now substitute forlnh(w,ξ)using (10.153), and forlnp(w|α)using (10.165),
giving
lnq(w)=−
E[α]
2
wTw+
∑N
n=1
{
(tn− 1 /2)wTφn−λ(ξn)wTφnφTnw
}
+const.
We see that this is a quadratic function ofwand so the solution forq(w)will be
Gaussian. Completing the square in the usual way, we obtain
q(w)=N(w|μN,ΣN) (10.174)
where we have defined
Σ−N^1 μN =
∑N
n=1
(tn− 1 /2)φn (10.175)
Σ−N^1 = E[α]I+2
∑N
n=1
λ(ξn)φnφTn. (10.176)
Similarly, the optimal solution for the factorq(α)is obtained from
lnq(α)=Ew[lnp(w|α)] + lnp(α)+const.
Substituting forlnp(w|α)using (10.165), and forlnp(α)using (10.166), we obtain
lnq(α)=
M
2
lnα−
α
2
E
[
wTw
]
+(a 0 −1) lnα−b 0 α+const.
We recognize this as the log of a gamma distribution, and so we obtain
q(α)=Gam(α|aN,bN)=
1
Γ(a 0 )
ab 00 αa^0 −^1 e−b^0 α (10.177)
where
aN = a 0 +
M
2
(10.178)
bN = b 0 +
1
2
Ew
[
wTw
]
. (10.179)