Understanding Machine Learning: From Theory to Algorithms

266 Nearest Neighbor

We use the notationy∼pas a shorthand for “yis a Bernoulli random variable
with expected valuep.” Prove the following lemma:
lemma19.7 Letk≥ 10 and letZ 1 ,...,Zkbe independent Bernoulli random
variables withP[Zi= 1] =pi. Denotep=^1 k

∑

ipiandp

′= 1

k

∑k i=1Zi. Show that

Z E

1 ,...,Zk y∼Pp[y^6 =^1 [p′>^1 /2]]≤

(

1 +

√

8

k

)

y∼Pp[y^6 =^1 [p>^1 /2]].

Hints:

W.l.o.g. assume thatp≤ 1 /2. Then,Py∼p[y 6 = (^1) [p> 1 /2]] =p. Lety′= (^1) [p′> 1 /2].

Show that

Z E 1 ,...,Zk y∼Pp[y^6 =y′]−p=Z P 1 ,...,Zk

[p′> 1 /2](1− 2 p).

Use Chernoff’s bound (Lemma B.3) to show that

P[p′> 1 /2]≤e−kph( 21 p−^1 ) ,

where h(a) = (1 +a) log(1 +a)−a.

To conclude the proof of the lemma, you can rely on the following inequality
(without proving it): For everyp∈[0, 1 /2] andk≥10:

(1− 2 p)e−kp+

k 2 (log(2p)+1) ≤

√

8

kp.

Fix somep,p′∈[0,1] andy′∈{ 0 , 1 }. Show that

yP∼p[y^6 =y′]≤y∼Pp′[y^6 =y′] +|p−p′|.

Conclude the proof of the theorem according to the following steps:

As in the proof of Theorem 19.3, six some >0 and letC 1 ,...,Crbe the
cover of the setXusing boxes of length. For eachx,x′in the same
box we have‖x−x′‖≤

√

d. Otherwise,‖x−x′‖≤ 2

√

d. Show that

ES[LD(hS)]≤ES





∑

i:|Ci∩S|<k

P[Ci]





+ maxi P S,(x,y)

[

hS(x) 6 =y| ∀j∈[k],‖x−xπj(x)‖≤

√

d

]

. (19.3)

Bound the first summand using Lemma 19.6.

To bound the second summand, let us fixS|xandxsuch that all thek
neighbors ofxinS|xare at distance of at most

√

dfromx. W.l.o.g assume that thekNN arex 1 ,...,xk. Denotepi=η(xi) and letp= 1 k

∑

ipi. Use Exercise 3 to show that

y E 1 ,...,yj

P

y∼η(x)

[hS(x) 6 =y]≤y E 1 ,...,yj y∼Pp[hS(x)^6 =y] +|p−η(x)|.

Understanding Machine Learning: From Theory to Algorithms

∑

′= 1

Z E

(

1 +

√

8

)

√

8

√

√





∑





[

√

]

. (19.3)

√

∑

P

Get our desktop app

Company

Features

Documentation

Resources