# Pattern Recognition and Machine Learning

(Jeff_L) #1
``6.4. Gaussian Processes 305``

x 1 ,...,xN. We are therefore interested in the joint distribution of the function val-
uesy(x 1 ),...,y(xN), which we denote by the vectorywith elementsyn=y(xn)
forn=1,...,N. From (6.49), this vector is given by
y=Φw (6.51)
whereΦis the design matrix with elementsΦnk=φk(xn). We can find the proba-
bility distribution ofyas follows. First of all we note thatyis a linear combination of
Gaussian distributed variables given by the elements ofwand hence is itself Gaus-
Exercise 2.31 sian. We therefore need only to find its mean and covariance, which are given from
(6.50) by
E[y]=ΦE[w]= 0 (6.52)

``cov[y]=E``

``````[
yyT``````

``````]
=ΦE``````

``````[
wwT``````

``````]
ΦT=``````

##### 1

``α``

##### ΦΦT=K (6.53)

``whereKis the Gram matrix with elements``

``Knm=k(xn,xm)=``

##### 1

``α``

``φ(xn)Tφ(xm) (6.54)``

``````andk(x,x′)is the kernel function.
This model provides us with a particular example of a Gaussian process. In gen-
eral, a Gaussian process is defined as a probability distribution over functionsy(x)
such that the set of values ofy(x)evaluated at an arbitrary set of pointsx 1 ,...,xN
jointly have a Gaussian distribution. In cases where the input vectorxis two di-
mensional, this may also be known as aGaussian random field. More generally, a
stochastic processy(x)is specified by giving the joint probability distribution for
any finite set of valuesy(x 1 ),...,y(xN)in a consistent manner.
A key point about Gaussian stochastic processes is that the joint distribution
overNvariablesy 1 ,...,yNis specified completely by the second-order statistics,
namely the mean and the covariance. In most applications, we will not have any
prior knowledge about the mean ofy(x)and so by symmetry we take it to be zero.
This is equivalent to choosing the mean of the prior over weight valuesp(w|α)to
be zero in the basis function viewpoint. The specification of the Gaussian process is
then completed by giving the covariance ofy(x)evaluated at any two values ofx,
which is given by the kernel function
E[y(xn)y(xm)] =k(xn,xm). (6.55)
For the specific case of a Gaussian process defined by the linear regression model
(6.49) with a weight prior (6.50), the kernel function is given by (6.54).
We can also define the kernel function directly, rather than indirectly through a
choice of basis function. Figure 6.4 shows samples of functions drawn from Gaus-
sian processes for two different choices of kernel function. The first of these is a
‘Gaussian’ kernel of the form (6.23), and the second is the exponential kernel given
by
k(x, x′)=exp(−θ|x−x′|) (6.56)
which corresponds to theOrnstein-Uhlenbeck processoriginally introduced by Uh-
lenbeck and Ornstein (1930) to describe Brownian motion.``````