10.3. Variational Linear Regression 489
where we have evaluated the integral by making use of the result (2.115) for the
linear-Gaussian model. Here the input-dependent variance is given by
σ^2 (x)=
1
β
+φ(x)TSNφ(x). (10.106)
Note that this takes the same form as the result (3.59) obtained with fixedαexcept
that now the expected valueE[α]appears in the definition ofSN.
10.3.3 Lower bound
Another quantity of importance is the lower boundLdefined by
L(q)=E[lnp(w,α,t)]−E[lnq(w,α)]
= Ew[lnp(t|w)] +Ew,α[lnp(w|α)] +Eα[lnp(α)]
−Eα[lnq(w)]w−E[lnq(α)]. (10.107)
Exercise 10.27 Evaluation of the various terms is straightforward, making use of results obtained in
previous chapters, and gives
E[lnp(t|w)]w =
N
2
ln
(
β
2 π
)
−
β
2
tTt+βmTNΦTt
−
β
2
Tr
[
ΦTΦ(mNmTN+SN)
]
(10.108)
E[lnp(w|α)]w,α = −
M
2
ln(2π)+
M
2
(ψ(aN)−lnbN)
−
aN
2 bN
[
mTNmN+Tr(SN)
]
(10.109)
E[lnp(α)]α = a 0 lnb 0 +(a 0 −1) [ψ(aN)−lnbN]
−b 0
aN
bN
−ln Γ(aN) (10.110)
−E[lnq(w)]w =
1
2
ln|SN|+
M
2
[1 + ln(2π)] (10.111)
−E[lnq(α)]α =lnΓ(aN)−(aN−1)ψ(aN)−lnbN+aN. (10.112)
Figure 10.9 shows a plot of the lower boundL(q)versus the degree of a polynomial
model for a synthetic data set generated from a degree three polynomial. Here the
prior parameters have been set toa 0 =b 0 =0, corresponding to the noninformative
priorp(α)∝ 1 /α, which is uniform overlnαas discussed in Section 2.3.6. As
we saw in Section 10.1, the quantityLrepresents lower bound on the log marginal
likelihoodp(t|M)for the model. If we assign equal prior probabilitiesp(M)to the
different values ofM, then we can interpretLas an approximation to the poste-
rior model probabilityp(M|t). Thus the variational framework assigns the highest
probability to the model withM=3. This should be contrasted with the maximum
likelihood result, which assigns ever smaller residual error to models of increasing
complexity until the residual error is driven to zero, causing maximum likelihood to
favour severely over-fitted models.