378 Rademacher Complexities
Furthermore, ifh?= argminhLD(h)then for eachδ∈(0,1)with probability of
at least 1 −δover the choice ofSwe have
LD(ERMH(S))−LD(h?) ≤
2 ES′∼DmR(`◦H◦S′)
δ
.
Proof The first inequality follows directly from Lemma 26.2. The second in-
equality follows because for any fixedh?,
LD(h?) =ES[LS(h?)]≥ES[LS(ERMH(S))].
The third inequality follows from the previous inequality by relying on Markov’s
inequality (note that the random variableLD(ERMH(S))−LD(h?) is nonnega-
tive).
Next, we derive bounds similar to the bounds in Theorem 26.3 with a better
dependence on the confidence parameterδ. To do so, we first introduce the
following bounded differences concentration inequality.
lemma26.4 (McDiarmid’s Inequality) LetVbe some set and letf:Vm→R
be a function ofmvariables such that for somec > 0 , for alli∈[m]and for all
x 1 ,...,xm,x′i∈Vwe have
|f(x 1 ,...,xm)−f(x 1 ,...,xi− 1 ,x′i,xi+1,...,xm)|≤c.
LetX 1 ,...,Xmbemindependent random variables taking values inV. Then,
with probability of at least 1 −δwe have
|f(X 1 ,...,Xm)−E[f(X 1 ,...,Xm)]|≤c
√
ln
( 2
δ
)
m/ 2.
On the basis of the McDiarmid inequality we can derive generalization bounds
with a better dependence on the confidence parameter.
theorem26.5 Assume that for allzandh∈ Hwe have that|`(h,z)| ≤c.
Then,
- With probability of at least 1 −δ, for allh∈H,
LD(h)−LS(h) ≤ 2 E
S′∼Dm
R(`◦H◦S′) +c
√
2 ln(2/δ)
m
.
In particular, this holds forh= ERMH(S).
- With probability of at least 1 −δ, for allh∈H,
LD(h)−LS(h) ≤ 2 R(`◦H◦S) + 4c
√
2 ln(4/δ)
m
.
In particular, this holds forh= ERMH(S).
- For anyh?, with probability of at least 1 −δ,
LD(ERMH(S))−LD(h?) ≤ 2 R(`◦H◦S) + 5c
√
2 ln (8/δ)
m