1.2. Probability Theory 31
In the curve fitting problem, we are given the training dataxandt, along with
a new test pointx, and our goal is to predict the value oft. We therefore wish
to evaluate the predictive distributionp(t|x,x,t). Here we shall assume that the
parametersαandβare fixed and known in advance (in later chapters we shall discuss
how such parameters can be inferred from data in a Bayesian setting).
A Bayesian treatment simply corresponds to a consistent application of the sum
and product rules of probability, which allow the predictive distribution to be written
in the form
p(t|x,x,t)=
∫
p(t|x,w)p(w|x,t)dw. (1.68)
Herep(t|x,w)is given by (1.60), and we have omitted the dependence onαand
βto simplify the notation. Herep(w|x,t)is the posterior distribution over param-
eters, and can be found by normalizing the right-hand side of (1.66). We shall see
in Section 3.3 that, for problems such as the curve-fitting example, this posterior
distribution is a Gaussian and can be evaluated analytically. Similarly, the integra-
tion in (1.68) can also be performed analytically with the result that the predictive
distribution is given by a Gaussian of the form
p(t|x,x,t)=N
(
t|m(x),s^2 (x)
)
(1.69)
where the mean and variance are given by
m(x)=βφ(x)TS
∑N
n=1
φ(xn)tn (1.70)
s^2 (x)=β−^1 +φ(x)TSφ(x). (1.71)
Here the matrixSis given by
S−^1 =αI+β
∑N
n=1
φ(xn)φ(x)T (1.72)
whereIis the unit matrix, and we have defined the vectorφ(x)with elements
φi(x)=xifori=0,...,M.
We see that the variance, as well as the mean, of the predictive distribution in
(1.69) is dependent onx. The first term in (1.71) represents the uncertainty in the
predicted value oftdue to the noise on the target variables and was expressed already
in the maximum likelihood predictive distribution (1.64) throughβML−^1. However, the
second term arises from the uncertainty in the parameterswand is a consequence
of the Bayesian treatment. The predictive distribution for the synthetic sinusoidal
regression problem is illustrated in Figure 1.17.