Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
19.6 Exercises 265

is consistent (with respect to the hypothesis class of all functions fromRdto
{ 0 , 1 }). A good presentation of the analysis is given in the book of Devroye et al.
(1996). Here, we give a finite sample guarantee that explicitly underscores the
prior assumption on the distribution. See Section 7.4 for a discussion on con-
sistency results. Finally, Gottlieb, Kontorovich & Krauthgamer (2010) derived
another finite sample bound for NN that is more similar to VC bounds.

19.6 Exercises


In this exercise we will prove the following theorem for thek-NNrule.
theorem19.5 LetX= [0,1]d,Y={ 0 , 1 }, andDbe a distribution overX×Y
for which the conditional probability function,η, is ac-Lipschitz function. LethS
denote the result of applying the k-NN rule to a sampleS∼Dm, wherek≥ 10.
Leth?be the Bayes optimal hypothesis. Then,

E
S

[LD(hS)]≤

(

1 +


8

k

)

LD(h?) +

(

6 c


d+k

)

m−^1 /(d+1).


  1. Prove the following lemma.
    lemma19.6 LetC 1 ,...,Crbe a collection of subsets of some domain set,
    X. LetSbe a sequence ofmpoints sampled i.i.d. according to some probability
    distribution,DoverX. Then, for everyk≥ 2 ,


S∼DEm




i:|Ci∩S|<k

P[Ci]


 ≤^2 rk
m

.

Hints:


  • Show that


ES




i:|Ci∩S|<k

P[Ci]


 =

∑r

i=1

P[Ci]PS[|Ci∩S|< k].


  • Fix someiand suppose thatk <P[Ci]m/2. Use Chernoff’s bound to show
    that
    P
    S


[|Ci∩S|< k]≤P
S

[|Ci∩S|<P[Ci]m/2]≤e−P[Ci]m/^8.


  • Use the inequality maxaae−ma≤me^1 to show that for suchiwe have


P[Ci]P
S

[|Ci∩S|< k]≤P[Ci]e−P[Ci]m/^8 ≤^8
me

.


  • Conclude the proof by using the fact that for the casek≥P[Ci]m/2 we
    clearly have:


P[Ci]P
S

[|Ci∩S|< k]≤P[Ci]≤^2 k
m

.
Free download pdf