262 Nearest Neighbor
Proof From the linearity of expectation, we can rewrite:
ES
∑
i:Ci∩S=∅
P[Ci]
=
∑r
i=1
P[Ci]ES
[
(^1) [Ci∩S=∅]
]
.
Next, for eachiwe have
E
S
[
(^1) [Ci∩S=∅]
]
=P
S
[Ci∩S=∅] = (1−P[Ci])m≤e−P[Ci]m.
Combining the preceding two equations we get
E
S
∑
i:Ci∩S=∅
P[Ci]
≤
∑r
i=1
P[Ci]e−P[Ci]m ≤ rmax
i
P[Ci]e−P[Ci]m.
Finally, by a standard calculus, maxaae−ma≤me^1 and this concludes the proof.
Equipped with the preceding lemmas we are now ready to state and prove the
main result of this section – an upper bound on the expected error of the 1-NN
learning rule.
theorem19.3 LetX= [0,1]d,Y={ 0 , 1 }, andDbe a distribution overX×Y
for which the conditional probability function,η, is ac-Lipschitz function. Let
hSdenote the result of applying the 1-NN rule to a sampleS∼Dm. Then,
E
S∼Dm
[LD(hS)]≤ 2 LD(h?) + 4c
√
dm−
1
d+1.
Proof Fix some= 1/T, for some integerT, letr=Tdand letC 1 ,...,Crbe the
cover of the setXusing boxes of length: Namely, for every (α 1 ,...,αd)∈[T]d,
there exists a setCiof the form{x:∀j,xj∈[(αj−1)/T,αj/T]}. An illustration
ford= 2,T= 5 and the set corresponding toα= (2,4) is given in the following.
1
1
For eachx,x′in the same box we have‖x−x′‖≤
√
d. Otherwise,‖x−x′‖≤
√
d.
Therefore,
xE,S[‖x−xπ^1 (x)‖]≤ES
P
⋃
i:Ci∩S=∅
Ci
√
d+P
⋃
i:Ci∩S 6 =∅
Ci
√
d
,
and by combining Lemma 19.2 with the trivial boundP[
⋃
i:Ci∩S 6 =∅Ci]≤1 we
get that
xE,S[‖x−xπ^1 (x)‖]≤
√
d
(r
me+