26.1 The Rademacher Complexity 377Next, we note that for eachj,zjandz′jare i.i.d. variables. Therefore, we can
replace them without affecting the expectation:
S,SE′
sup
f∈F
(f(z′j)−f(zj)) +∑
i 6 =j(f(zi′)−f(zi))
=
S,SE′
sup
f∈F
(f(zj)−f(z′j)) +∑
i 6 =j(f(zi′)−f(zi))
.
(26.7)
Letσjbe a random variable such thatP[σj= 1] =P[σj=−1] = 1/2. From
Equation (26.7) we obtain that
S,SE′,σ
j
sup
f∈F
σj(f(z′j)−f(zj)) +∑
i 6 =j(f(zi′)−f(zi))
=^1
2
(l.h.s. of Equation (26.7)) +^1
2(r.h.s. of Equation (26.7))= E
S,S′
sup
f∈F
(f(zj′)−f(zj)) +∑
i 6 =j(f(z′i)−f(zi))
.
(26.8)
Repeating this for alljwe obtain that
S,SE′
[
sup
f∈F∑mi=1(f(z′i)−f(zi))]
=S,SE′,σ[
sup
f∈F∑mi=1σi(f(zi′)−f(zi))]
. (26.9)
Finally,
sup
f∈F∑
iσi(f(z′i)−f(zi))≤sup
f∈F∑
iσif(z′i) + sup
f∈F∑
i−σif(zi)and since the probability ofσis the same as the probability of−σ, the right-hand
side of Equation (26.9) can be bounded by
E
S,S′,σ[
sup
f∈F∑
iσif(z′i) + sup
f∈F∑
iσif(zi)]
= mE
S′[R(F ◦S′)] +mE
S[R(F ◦S)] = 2mE
S[R(F ◦S)].
The lemma immediately yields that, in expectation, the ERM rule finds a
hypothesis which is close to the optimal hypothesis inH.
theorem26.3 We have
S∼DEm[LD(ERMH(S))−LS(ERMH(S))]≤^2 S∼DEmR(`◦H◦S).Furthermore, for anyh?∈H
E
S∼Dm[LD(ERMH(S))−LD(h?)] ≤ 2 E
S∼Dm