378 Rademacher Complexities
Furthermore, ifh?= argminhLD(h)then for eachδ∈(0,1)with probability of
at least 1 −δover the choice ofSwe haveLD(ERMH(S))−LD(h?) ≤2 ES′∼DmR(`◦H◦S′)
δ.
Proof The first inequality follows directly from Lemma 26.2. The second in-
equality follows because for any fixedh?,LD(h?) =ES[LS(h?)]≥ES[LS(ERMH(S))].The third inequality follows from the previous inequality by relying on Markov’s
inequality (note that the random variableLD(ERMH(S))−LD(h?) is nonnega-
tive).Next, we derive bounds similar to the bounds in Theorem 26.3 with a better
dependence on the confidence parameterδ. To do so, we first introduce the
following bounded differences concentration inequality.lemma26.4 (McDiarmid’s Inequality) LetVbe some set and letf:Vm→R
be a function ofmvariables such that for somec > 0 , for alli∈[m]and for all
x 1 ,...,xm,x′i∈Vwe have|f(x 1 ,...,xm)−f(x 1 ,...,xi− 1 ,x′i,xi+1,...,xm)|≤c.LetX 1 ,...,Xmbemindependent random variables taking values inV. Then,
with probability of at least 1 −δwe have|f(X 1 ,...,Xm)−E[f(X 1 ,...,Xm)]|≤c√
ln( 2
δ)
m/ 2.On the basis of the McDiarmid inequality we can derive generalization bounds
with a better dependence on the confidence parameter.theorem26.5 Assume that for allzandh∈ Hwe have that|`(h,z)| ≤c.
Then,- With probability of at least 1 −δ, for allh∈H,
LD(h)−LS(h) ≤ 2 E
S′∼DmR(`◦H◦S′) +c√
2 ln(2/δ)
m.
In particular, this holds forh= ERMH(S).- With probability of at least 1 −δ, for allh∈H,
LD(h)−LS(h) ≤ 2 R(`◦H◦S) + 4c√
2 ln(4/δ)
m.
In particular, this holds forh= ERMH(S).- For anyh?, with probability of at least 1 −δ,
LD(ERMH(S))−LD(h?) ≤ 2 R(`◦H◦S) + 5c√
2 ln (8/δ)
m