Understanding Machine Learning: From Theory to Algorithms

262 Nearest Neighbor

Proof From the linearity of expectation, we can rewrite:

ES





∑

i:Ci∩S=∅

P[Ci]



 =

∑r

i=1

P[Ci]ES

[

(^1) [Ci∩S=∅]

]

.

Next, for eachiwe have E S

[

(^1) [Ci∩S=∅]

]

=P

S

[Ci∩S=∅] = (1−P[Ci])m≤e−P[Ci]m.

Combining the preceding two equations we get

E

S





∑

i:Ci∩S=∅

P[Ci]



 ≤

∑r

i=1

P[Ci]e−P[Ci]m ≤ rmax i

P[Ci]e−P[Ci]m.

Finally, by a standard calculus, maxaae−ma≤me^1 and this concludes the proof.

Equipped with the preceding lemmas we are now ready to state and prove the main result of this section – an upper bound on the expected error of the 1-NN learning rule.

theorem19.3 LetX= [0,1]d,Y={ 0 , 1 }, andDbe a distribution overX×Y for which the conditional probability function,η, is ac-Lipschitz function. Let hSdenote the result of applying the 1-NN rule to a sampleS∼Dm. Then,

E S∼Dm

[LD(hS)]≤ 2 LD(h?) + 4c

√

dm−

1 d+1.

Proof Fix some= 1/T, for some integerT, letr=Tdand letC 1 ,...,Crbe the cover of the setXusing boxes of length: Namely, for every (α 1 ,...,αd)∈[T]d, there exists a setCiof the form{x:∀j,xj∈[(αj−1)/T,αj/T]}. An illustration ford= 2,T= 5 and the set corresponding toα= (2,4) is given in the following.

1

For eachx,x′in the same box we have‖x−x′‖≤

√

d. Otherwise,‖x−x′‖≤

√

d. Therefore,

xE,S[‖x−xπ^1 (x)‖]≤ES



P





⋃

i:Ci∩S=∅

Ci





√

d+P





⋃

i:Ci∩S 6 =∅

Ci





√

d



,

and by combining Lemma 19.2 with the trivial boundP[

⋃

i:Ci∩S 6 =∅Ci]≤1 we get that

xE,S[‖x−xπ^1 (x)‖]≤

√

d

(r me+

)

.

Understanding Machine Learning: From Theory to Algorithms

ES





∑



 =

[

]

.

[

]

=P

E





∑



 ≤

√

√

√



P





⋃





√





⋃





√



,

⋃

√

)

.

Get our desktop app

Company

Features

Documentation

Resources

