B.5 Bennet’s and Bernstein’s Inequalities 427
Then for all > 0 ,
P
[m
∑
i=1
Zi>
]
≤e−mσ
(^2) h(mσ 2 )
.
where
h(a) = (1 +a) log(1 +a)−a.
By using the inequalityh(a) ≥a^2 /(2 + 2a/3) it is possible to derive the
following:
lemmaB.9 (Bernstein’s inequality) LetZ 1 ,...,Zmbe i.i.d. random variables
with a zero mean. If for all i,P(|Zi|< M) = 1, then for allt > 0 :
P
[m
∑
i=1
Zi> t
]
≤exp
(
−
∑ t^2 /^2
EZj^2 +Mt/ 3
)
.
B.5.1 Application
Bernstein’s inequality can be used to interpolate between the rate 1/we derived
for PAC learning in the realizable case (in Chapter 2) and the rate 1/^2 we derived
for the unrealizable case (in Chapter 4).
lemmaB.10 Let`:H×Z→[0,1]be a loss function. LetDbe an arbitrary
distribution overZ. Fix someh. Then, for anyδ∈(0,1)we have
1. S∼DPm
[
LS(h)≥LD(h) +
√
2 LD(h) log(1/δ)
3 m
+
2 log(1/δ)
m
]
≤δ
2. S∼DPm
[
LD(h)≥LS(h) +
√
2 LS(h) log(1/δ)
m +
4 log(1/δ)
m
]
≤δ
Proof Define random variablesα 1 ,...,αms.t.αi=`(h,zi)−LD(h). Note that
E[αi] = 0 and that
E[α^2 i] =E[`(h,zi)^2 ]− 2 LD(h)E[`(h,zi)] +LD(h)^2
=E[`(h,zi)^2 ]−LD(h)^2
≤E[`(h,zi)^2 ]
≤E[`(h,zi)] =LD(h),
where in the last inequality we used the fact that`(h,zi)∈[0,1] and thus
`(h,zi)^2 ≤`(h,zi). Applying Bernsein’s inequality over theαi’s yields
P
[m
∑
i=1
αi> t
]
≤exp
(
−
∑ t^2 /^2
Eα^2 j+t/ 3
)
≤exp
(
−
t^2 / 2
mLD(h) +t/ 3
)
def=δ.