Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

428 Measure Concentration


Solving fortyields
t^2 / 2
mLD(h) +t/ 3 = log(1/δ)

⇒ t^2 / 2 −

log(1/δ)
3

t−log(1/δ)mLD(h) = 0

⇒ t=

log(1/δ)
3 +


log^2 (1/δ)
32 + 2 log(1/δ)mLD(h)

≤ 2 log(1/δ)
3

+


2 log(1/δ)mLD(h)

Sincem^1


iαi=LS(h)−LD(h), it follows that with probability of at least 1−δ,

LS(h)−LD(h)≤ 2

log(1/δ)
3 m

+


2 log(1/δ)LD(h)
m

,

which proves the first inequality. The second part of the lemma follows in a
similar way.

B.6 Slud’s Inequality


LetXbe a (m,p) binomial variable. That is,X=

∑m
i=1Zi, where eachZiis 1
with probabilitypand 0 with probability 1−p. Assume thatp= (1−)/2. Slud’s
inequality (Slud 1977) tells us thatP[X≥m/2] is lower bounded by the proba-
bility that a normal variable will be greater than or equal to


m^2 /(1−^2 ). The
following lemma follows by standard tail bounds for the normal distribution.
lemmaB.11 LetXbe a(m,p)binomial variable and assume thatp= (1−)/ 2.
Then,
P[X≥m/2]≥^1
2

(

1 −


1 −exp(−m^2 /(1−^2 ))

)

.

B.7 Concentration ofχ^2 Variables


LetX 1 ,...,Xkbekindependent normally distributed random variables. That
is, for alli,Xi∼N(0,1). The distribution of the random variableXi^2 is called
χ^2 (chi square) and the distribution of the random variableZ=X 12 +···+Xk^2
is calledχ^2 k(chi square withkdegrees of freedom). Clearly,E[Xi^2 ] = 1 and
E[Z] =k. The following lemma states thatXk^2 is concentrated around its mean.
lemmaB.12 LetZ∼χ^2 k. Then, for all > 0 we have

P[Z≤(1−)k]≤e−

(^2) k/ 6
,
and for all∈(0,3)we have
P[Z≥(1 +)k]≤e−
(^2) k/ 6
.

Free download pdf