Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

262 Nearest Neighbor


Proof From the linearity of expectation, we can rewrite:

ES




i:Ci∩S=∅

P[Ci]


 =

∑r

i=1

P[Ci]ES

[

(^1) [Ci∩S=∅]


]

.

Next, for eachiwe have
E
S

[

(^1) [Ci∩S=∅]


]

=P

S

[Ci∩S=∅] = (1−P[Ci])m≤e−P[Ci]m.

Combining the preceding two equations we get

E

S




i:Ci∩S=∅

P[Ci]


 ≤

∑r

i=1

P[Ci]e−P[Ci]m ≤ rmax
i

P[Ci]e−P[Ci]m.

Finally, by a standard calculus, maxaae−ma≤me^1 and this concludes the proof.

Equipped with the preceding lemmas we are now ready to state and prove the
main result of this section – an upper bound on the expected error of the 1-NN
learning rule.

theorem19.3 LetX= [0,1]d,Y={ 0 , 1 }, andDbe a distribution overX×Y
for which the conditional probability function,η, is ac-Lipschitz function. Let
hSdenote the result of applying the 1-NN rule to a sampleS∼Dm. Then,

E
S∼Dm

[LD(hS)]≤ 2 LD(h?) + 4c


dm−

1
d+1.

Proof Fix some= 1/T, for some integerT, letr=Tdand letC 1 ,...,Crbe the
cover of the setXusing boxes of length: Namely, for every (α 1 ,...,αd)∈[T]d,
there exists a setCiof the form{x:∀j,xj∈[(αj−1)/T,αj/T]}. An illustration
ford= 2,T= 5 and the set corresponding toα= (2,4) is given in the following.

1

1

For eachx,x′in the same box we have‖x−x′‖≤


d. Otherwise,‖x−x′‖≤


d.
Therefore,

xE,S[‖x−xπ^1 (x)‖]≤ES


P




i:Ci∩S=∅

Ci




d+P




i:Ci∩S 6 =∅

Ci





d


,

and by combining Lemma 19.2 with the trivial boundP[


i:Ci∩S 6 =∅Ci]≤1 we
get that

xE,S[‖x−xπ^1 (x)‖]≤


d

(r
me+

)

.
Free download pdf