Understanding Machine Learning: From Theory to Algorithms

428 Measure Concentration

Solving fortyields t^2 / 2 mLD(h) +t/ 3 = log(1/δ)

⇒ t^2 / 2 −

log(1/δ) 3

t−log(1/δ)mLD(h) = 0

⇒ t=

log(1/δ) 3 +

√

log^2 (1/δ) 32 + 2 log(1/δ)mLD(h)

≤ 2 log(1/δ) 3

+

√

2 log(1/δ)mLD(h)

Sincem^1

∑

iαi=LS(h)−LD(h), it follows that with probability of at least 1−δ,

LS(h)−LD(h)≤ 2

log(1/δ) 3 m

+

√

2 log(1/δ)LD(h) m

,

which proves the first inequality. The second part of the lemma follows in a similar way.

B.6 Slud’s Inequality

LetXbe a (m,p) binomial variable. That is,X=

∑m i=1Zi, where eachZiis 1 with probabilitypand 0 with probability 1−p. Assume thatp= (1−)/2. Slud’s inequality (Slud 1977) tells us thatP[X≥m/2] is lower bounded by the probability that a normal variable will be greater than or equal to

√

m^2 /(1−^2 ). The following lemma follows by standard tail bounds for the normal distribution. lemmaB.11 LetXbe a(m,p)binomial variable and assume thatp= (1−)/ 2. Then, P[X≥m/2]≥^1 2

(

1 −

√

1 −exp(−m^2 /(1−^2 ))

)

.

B.7 Concentration ofχ^2 Variables

LetX 1 ,...,Xkbekindependent normally distributed random variables. That is, for alli,Xi∼N(0,1). The distribution of the random variableXi^2 is called χ^2 (chi square) and the distribution of the random variableZ=X 12 +···+Xk^2 is calledχ^2 k(chi square withkdegrees of freedom). Clearly,E[Xi^2 ] = 1 and E[Z] =k. The following lemma states thatXk^2 is concentrated around its mean. lemmaB.12 LetZ∼χ^2 k. Then, for all > 0 we have

P[Z≤(1−)k]≤e−

(^2) k/ 6
,
and for all∈(0,3)we have
P[Z≥(1 +)k]≤e−
(^2) k/ 6
.

Understanding Machine Learning: From Theory to Algorithms

√

+

√

∑

+

√

,

√

(

1 −

√

)

.

Get our desktop app

Company

Features

Documentation

Resources