Pattern Recognition and Machine Learning

(Jeff_L) #1
308 6. KERNEL METHODS

(1. 00 , 4. 00 , 0. 00 , 0 .00)

−1 −0.5 0 0.5 1

−3

−1.5

0

1.5

3

(9. 00 , 4. 00 , 0. 00 , 0 .00)

−1 −0.5 0 0.5 1

−9

−4.5

0

4.5

9

(1. 00 , 64. 00 , 0. 00 , 0 .00)

−1 −0.5 0 0.5 1

−3

−1.5

0

1.5

3

(1. 00 , 0. 25 , 0. 00 , 0 .00)

−1 −0.5 0 0.5 1

−3

−1.5

0

1.5

3

(1. 00 , 4. 00 , 10. 00 , 0 .00)

−1 −0.5 0 0.5 1

−9

−4.5

0

4.5

9

(1. 00 , 4. 00 , 0. 00 , 5 .00)

−1 −0.5 0 0.5 1

−4

−2

0

2

4

Figure 6.5 Samples from a Gaussian process prior defined by the covariance function (6.63). The title above
each plot denotes(θ 0 ,θ 1 ,θ 2 ,θ 3 ).


c=k(xN+1,xN+1)+β−^1. Using the results (2.81) and (2.82), we see that the con-
ditional distributionp(tN+1|t)is a Gaussian distribution with mean and covariance
given by

m(xN+1)=kTC−N^1 t (6.66)
σ^2 (xN+1)=c−kTC−N^1 k. (6.67)

These are the key results that define Gaussian process regression. Because the vector
kis a function of the test point input valuexN+1, we see that the predictive distribu-
tion is a Gaussian whose mean and variance both depend onxN+1. An example of
Gaussian process regression is shown in Figure 6.8.
The only restriction on the kernel function is that the covariance matrix given by
(6.62) must be positive definite. Ifλiis an eigenvalue ofK, then the corresponding
eigenvalue ofCwill beλi+β−^1. It is therefore sufficient that the kernel matrix
k(xn,xm)be positive semidefinite for any pair of pointsxnandxm, so thatλi 0 ,
because any eigenvalueλithat is zero will still give rise to a positive eigenvalue
forCbecauseβ> 0. This is the same restriction on the kernel function discussed
earlier, and so we can again exploit all of the techniques in Section 6.2 to construct
Free download pdf