# Pattern Recognition and Machine Learning

(Jeff_L) #1
##### 302 6. KERNEL METHODS

``the input variable, which is given by``

``y(x)=E[t|x]=``

``∫∞``

``−∞``

``tp(t|x)dt``

##### =

``````∫
tp(x,t)dt
∫
p(x,t)dt``````

##### =

``∑``

``n``

``````∫
tf(x−xn,t−tn)dt``````

``∑``

``m``

``````∫
f(x−xm,t−tm)dt``````

##### . (6.43)

``````We now assume for simplicity that the component density functions have zero mean
so that ∫∞``````

``−∞``

``f(x,t)tdt=0 (6.44)``

``for all values ofx. Using a simple change of variable, we then obtain``

``y(x)=``

``∑``

``n``

``````g(x−xn)tn
∑``````

``m``

``g(x−xm)``

##### =

``∑``

``n``

``k(x,xn)tn (6.45)``

``wheren, m=1,...,Nand the kernel functionk(x,xn)is given by``

``k(x,xn)=``

``````g(x−xn)
∑``````

``m``

``g(x−xm)``

##### (6.46)

``````and we have defined
g(x)=``````

``∫∞``

``−∞``

``f(x,t)dt. (6.47)``

``````The result (6.45) is known as theNadaraya-Watsonmodel, orkernel regression
(Nadaraya, 1964; Watson, 1964). For a localized kernel function, it has the prop-
erty of giving more weight to the data pointsxnthat are close tox. Note that the
kernel (6.46) satisfies the summation constraint``````

``∑N``

``n=1``

``k(x,xn)=1.``