Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
26.1 The Rademacher Complexity 381

and therefore


mR(A′)≤log

(


a∈A′

∏m

i=1

exp

(

a^2 i
2

))

= log

(


a∈A′

exp

(

‖a‖^2 / 2

)

)

≤log

(

|A′|maxa∈A′exp

(

‖a‖^2 / 2

))

= log(|A′|) + maxa∈A′(‖a‖^2 /2).

SinceR(A) =λ^1 R(A′) we obtain from the equation that


R(A)≤

log(|A|) +λ^2 maxa∈A(‖a‖^2 /2)
λm

.

Settingλ=



2 log(|A|)/maxa∈A‖a‖^2 and rearranging terms we conclude our
proof.


The following lemma shows that composingAwith a Lipschitz function does
not blow up the Rademacher complexity. The proof is due to Kakade and Tewari.


lemma26.9 (Contraction lemma) For eachi∈[m], letφi:R→Rbe aρ-
Lipschitz function, namely for allα,β∈Rwe have|φi(α)−φi(β)| ≤ρ|α−β|.
Fora∈Rmletφ(a)denote the vector(φ 1 (a 1 ),...,φm(ym)). Letφ◦A={φ(a) :
a∈A}. Then,


R(φ◦A)≤ρR(A).

Proof For simplicity, we prove the lemma for the caseρ= 1. The caseρ 6 =
1 will follow by definingφ′ =^1 ρφand then using Lemma 26.6. Let Ai =
{(a 1 ,...,ai− 1 ,φi(ai),ai+1,...,am) :a∈A}. Clearly, it suffices to prove that
for any setAand alliwe haveR(Ai)≤R(A). Without loss of generality we will
prove the latter claim fori= 1 and to simplify notation we omit the subscript
fromφ 1. We have


mR(A 1 ) =Eσ

[

sup
a∈A 1

∑m

i=1

σiai

]

=Eσ

[

sup
a∈A

σ 1 φ(a 1 ) +

∑m

i=2

σiai

]

=

1

2

σ E
2 ,...,σm

[

sup
a∈A

(

φ(a 1 ) +

∑m

i=2

σiai

)

+ sup
a∈A

(

−φ(a 1 ) +

∑m

i=2

σiai

)]

=

1

2

σ E
2 ,...,σm

[

sup
a,a′∈A

(

φ(a 1 )−φ(a′ 1 ) +

∑m

i=2

σiai+

∑m

i=2

σia′i

)]


1

2

σ E
2 ,...,σm

[

sup
a,a′∈A

(

|a 1 −a′ 1 |+

∑m

i=2

σiai+

∑m

i=2

σia′i

)]

, (26.12)

where in the last inequality we used the assumption thatφis Lipschitz. Next,
we note that the absolute value on|a 1 −a′ 1 |in the preceding expression can

Free download pdf