Pattern Recognition and Machine Learning

308 6. KERNEL METHODS

(1. 00 , 4. 00 , 0. 00 , 0 .00)

−1 −0.5 0 0.5 1

−3

−1.5

0

1.5

3

(9. 00 , 4. 00 , 0. 00 , 0 .00)

−1 −0.5 0 0.5 1

−9

−4.5

0

4.5

9

(1. 00 , 64. 00 , 0. 00 , 0 .00)

−1 −0.5 0 0.5 1

−3

−1.5

0

1.5

3

(1. 00 , 0. 25 , 0. 00 , 0 .00)

−1 −0.5 0 0.5 1

−3

−1.5

0

1.5

3

(1. 00 , 4. 00 , 10. 00 , 0 .00)

−1 −0.5 0 0.5 1

−9

−4.5

0

4.5

9

(1. 00 , 4. 00 , 0. 00 , 5 .00)

−1 −0.5 0 0.5 1

−4

−2

0

2

4

Figure 6.5 Samples from a Gaussian process prior defined by the covariance function (6.63). The title above
each plot denotes(θ 0 ,θ 1 ,θ 2 ,θ 3 ).

c=k(xN+1,xN+1)+β−^1. Using the results (2.81) and (2.82), we see that the con- ditional distributionp(tN+1|t)is a Gaussian distribution with mean and covariance given by

m(xN+1)=kTC−N^1 t (6.66) σ^2 (xN+1)=c−kTC−N^1 k. (6.67)

These are the key results that define Gaussian process regression. Because the vector kis a function of the test point input valuexN+1, we see that the predictive distribution is a Gaussian whose mean and variance both depend onxN+1. An example of Gaussian process regression is shown in Figure 6.8. The only restriction on the kernel function is that the covariance matrix given by (6.62) must be positive definite. Ifλiis an eigenvalue ofK, then the corresponding eigenvalue ofCwill beλi+β−^1. It is therefore sufficient that the kernel matrix k(xn,xm)be positive semidefinite for any pair of pointsxnandxm, so thatλi 0 , because any eigenvalueλithat is zero will still give rise to a positive eigenvalue forCbecauseβ> 0. This is the same restriction on the kernel function discussed earlier, and so we can again exploit all of the techniques in Section 6.2 to construct

Pattern Recognition and Machine Learning

308 6. KERNEL METHODS

Get our desktop app

Company

Features

Documentation

Resources