302 6. KERNEL METHODS
the input variable, which is given by
y(x)=E[t|x]=
∫∞
−∞
tp(t|x)dt
=
∫
tp(x,t)dt
∫
p(x,t)dt
=
∑
n
∫
tf(x−xn,t−tn)dt
∑
m
∫
f(x−xm,t−tm)dt
. (6.43)
We now assume for simplicity that the component density functions have zero mean
so that ∫∞
−∞
f(x,t)tdt=0 (6.44)
for all values ofx. Using a simple change of variable, we then obtain
y(x)=
∑
n
g(x−xn)tn
∑
m
g(x−xm)
=
∑
n
k(x,xn)tn (6.45)
wheren, m=1,...,Nand the kernel functionk(x,xn)is given by
k(x,xn)=
g(x−xn)
∑
m
g(x−xm)
(6.46)
and we have defined
g(x)=
∫∞
−∞
f(x,t)dt. (6.47)
The result (6.45) is known as theNadaraya-Watsonmodel, orkernel regression
(Nadaraya, 1964; Watson, 1964). For a localized kernel function, it has the prop-
erty of giving more weight to the data pointsxnthat are close tox. Note that the
kernel (6.46) satisfies the summation constraint
∑N
n=1
k(x,xn)=1.