Pattern Recognition and Machine Learning

(Jeff_L) #1
302 6. KERNEL METHODS

the input variable, which is given by

y(x)=E[t|x]=

∫∞

−∞

tp(t|x)dt

=


tp(x,t)dt

p(x,t)dt

=


n


tf(x−xn,t−tn)dt


m


f(x−xm,t−tm)dt

. (6.43)

We now assume for simplicity that the component density functions have zero mean
so that ∫∞

−∞

f(x,t)tdt=0 (6.44)

for all values ofx. Using a simple change of variable, we then obtain

y(x)=


n

g(x−xn)tn

m

g(x−xm)

=


n

k(x,xn)tn (6.45)

wheren, m=1,...,Nand the kernel functionk(x,xn)is given by

k(x,xn)=

g(x−xn)

m

g(x−xm)

(6.46)

and we have defined
g(x)=

∫∞

−∞

f(x,t)dt. (6.47)

The result (6.45) is known as theNadaraya-Watsonmodel, orkernel regression
(Nadaraya, 1964; Watson, 1964). For a localized kernel function, it has the prop-
erty of giving more weight to the data pointsxnthat are close tox. Note that the
kernel (6.46) satisfies the summation constraint

∑N

n=1

k(x,xn)=1.
Free download pdf