Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
26.1 The Rademacher Complexity 377

Next, we note that for eachj,zjandz′jare i.i.d. variables. Therefore, we can
replace them without affecting the expectation:


S,SE′


 sup
f∈F


(f(z′j)−f(zj)) +


i 6 =j

(f(zi′)−f(zi))




 =

S,SE′


 sup
f∈F


(f(zj)−f(z′j)) +


i 6 =j

(f(zi′)−f(zi))




.

(26.7)

Letσjbe a random variable such thatP[σj= 1] =P[σj=−1] = 1/2. From
Equation (26.7) we obtain that


S,SE′,σ
j


sup
f∈F


σj(f(z′j)−f(zj)) +


i 6 =j

(f(zi′)−f(zi))





=^1

2

(l.h.s. of Equation (26.7)) +^1
2

(r.h.s. of Equation (26.7))

= E

S,S′


sup
f∈F


(f(zj′)−f(zj)) +


i 6 =j

(f(z′i)−f(zi))




.

(26.8)

Repeating this for alljwe obtain that


S,SE′

[

sup
f∈F

∑m

i=1

(f(z′i)−f(zi))

]

=S,SE′,σ

[

sup
f∈F

∑m

i=1

σi(f(zi′)−f(zi))

]

. (26.9)

Finally,


sup
f∈F


i

σi(f(z′i)−f(zi))≤sup
f∈F


i

σif(z′i) + sup
f∈F


i

−σif(zi)

and since the probability ofσis the same as the probability of−σ, the right-hand
side of Equation (26.9) can be bounded by


E

S,S′,σ

[

sup
f∈F


i

σif(z′i) + sup
f∈F


i

σif(zi)

]

= mE
S′

[R(F ◦S′)] +mE
S

[R(F ◦S)] = 2mE
S

[R(F ◦S)].

The lemma immediately yields that, in expectation, the ERM rule finds a
hypothesis which is close to the optimal hypothesis inH.


theorem26.3 We have


S∼DEm[LD(ERMH(S))−LS(ERMH(S))]≤^2 S∼DEmR(`◦H◦S).

Furthermore, for anyh?∈H


E
S∼Dm

[LD(ERMH(S))−LD(h?)] ≤ 2 E
S∼Dm

R(`◦H◦S).
Free download pdf