Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
B.4 Hoeffding’s Inequality 425

and,

E[e−tZ] =E[e−t


iZi] =E[


i

e−tZi]

=


i

E[e−tZi] by independence

=


i

(

1 +pi(e−t−1)

)



i

epi(e

−t−1)
using 1 +x≤ex

=e(e

−t−1)p
.

Settingt=−log(1−δ) yields

P[Z <(1−δ)p] ≤ e

−δp
e(1−δ) log(1−δ)p

=e−ph(−δ).

It is easy to verify thath(−δ)≥h(δ) and hence

lemmaB.5 Using the notation of Lemma B.3 we also have

P[Z <(1−δ)p] ≤ e−ph(−δ)≤e−ph(δ)≤e−p
2+2δ^2 δ/ 3
.

B.4 Hoeffding’s Inequality


lemma B.6 (Hoeffding’s inequality) LetZ 1 ,...,Zm be a sequence of i.i.d.
random variables and letZ ̄=m^1

∑m
i=1Zi. Assume thatE[Z ̄] =μandP[a≤
Zi≤b] = 1for everyi. Then, for any > 0

P

[∣∣

∣∣


1
m

∑m

i=1

Zi−μ

∣∣

∣∣


> 

]

≤ 2 exp

(

− 2 m^2 /(b−a)^2

)

.

Proof DenoteXi=Zi−E[Zi] andX ̄=m^1


iXi. Using the monotonicity of
the exponent function and Markov’s inequality, we have that for everyλ > 0
and >0,
P[X ̄≥] =P[eλX ̄≥eλ]≤e−λE[eλX ̄].

Using the independence assumption we also have

E[eλX ̄] =E

[


i

eλXi/m

]

=


i

E[eλXi/m].

By Hoeffding’s lemma (Lemma B.7 later), for everyiwe have

E[eλXi/m]≤e

λ^2 (b−a)^2
8 m^2.
Free download pdf