Understanding Machine Learning: From Theory to Algorithms

19.6 Exercises 265

is consistent (with respect to the hypothesis class of all functions fromRdto { 0 , 1 }). A good presentation of the analysis is given in the book of Devroye et al. (1996). Here, we give a finite sample guarantee that explicitly underscores the prior assumption on the distribution. See Section 7.4 for a discussion on con- sistency results. Finally, Gottlieb, Kontorovich & Krauthgamer (2010) derived another finite sample bound for NN that is more similar to VC bounds.

19.6 Exercises

In this exercise we will prove the following theorem for thek-NNrule. theorem19.5 LetX= [0,1]d,Y={ 0 , 1 }, andDbe a distribution overX×Y for which the conditional probability function,η, is ac-Lipschitz function. LethS denote the result of applying the k-NN rule to a sampleS∼Dm, wherek≥ 10. Leth?be the Bayes optimal hypothesis. Then,

E S

[LD(hS)]≤

(

1 +

√

8

k

)

LD(h?) +

(

6 c

√

d+k

)

m−^1 /(d+1).

Prove the following lemma.
lemma19.6 LetC 1 ,...,Crbe a collection of subsets of some domain set,
X. LetSbe a sequence ofmpoints sampled i.i.d. according to some probability
distribution,DoverX. Then, for everyk≥ 2 ,

S∼DEm





∑

i:|Ci∩S|<k

P[Ci]



 ≤^2 rk m

.

Hints:

Show that

ES





∑

i:|Ci∩S|<k

P[Ci]



 =

∑r

i=1

P[Ci]PS[|Ci∩S|< k].

Fix someiand suppose thatk <P[Ci]m/2. Use Chernoff’s bound to show
that
P
S

[|Ci∩S|< k]≤P S

[|Ci∩S|<P[Ci]m/2]≤e−P[Ci]m/^8.

Use the inequality maxaae−ma≤me^1 to show that for suchiwe have

P[Ci]P S

[|Ci∩S|< k]≤P[Ci]e−P[Ci]m/^8 ≤^8 me

.

Conclude the proof by using the fact that for the casek≥P[Ci]m/2 we
clearly have:

P[Ci]P S

[|Ci∩S|< k]≤P[Ci]≤^2 k m

.

Understanding Machine Learning: From Theory to Algorithms

19.6 Exercises

(

1 +

√

8

)

(

√

)





∑



.

ES





∑



 =

.

.

Get our desktop app

Company

Features

Documentation

Resources