382 Rademacher Complexities
be omitted since bothaanda′are from the same setAand the rest of the
expression in the supremum is not affected by replacingaanda′. Therefore,
mR(A 1 ) ≤^1
2
σ E
2 ,...,σm
[
sup
a,a′∈A
(
a 1 −a′ 1 +
∑m
i=2
σiai+
∑m
i=2
σia′i
)]
. (26.13)
But, using the same equalities as in Equation (26.12), it is easy to see that the
right-hand side of Equation (26.13) exactly equalsmR(A), which concludes our
proof.
26.2 Rademacher Complexity of Linear Classes
In this section we analyze the Rademacher complexity of linear classes. To sim-
plify the derivation we first define the following two classes:
H 1 ={x7→〈w,x〉:‖w‖ 1 ≤ 1 } , H 2 ={x7→〈w,x〉:‖w‖ 2 ≤ 1 }. (26.14)
The following lemma bounds the Rademacher complexity ofH 2. We allow
thexito be vectors in any Hilbert space (even infinite dimensional), and the
bound does not depend on the dimensionality of the Hilbert space. This property
becomes useful when analyzing kernel methods.
lemma26.10 LetS= (x 1 ,...,xm)be vectors in a Hilbert space. Define:H 2 ◦
S={(〈w,x 1 〉,...,〈w,xm〉) :‖w‖ 2 ≤ 1 }. Then,
R(H 2 ◦S) ≤ max√i‖xi‖^2
m
.
Proof Using Cauchy-Schwartz inequality we know that for any vectorsw,vwe
have〈w,v〉≤‖w‖‖v‖. Therefore,
mR(H 2 ◦S) =Eσ
[
sup
a∈H 2 ◦S
∑m
i=1
σiai
]
(26.15)
=Eσ
[
sup
w:‖w‖≤ 1
∑m
i=1
σi〈w,xi〉
]
=Eσ
[
sup
w:‖w‖≤ 1
〈w,
∑m
i=1
σixi〉
]
≤Eσ
[
‖
∑m
i=1
σixi‖ 2
]
.
Next, using Jensen’s inequality we have that
Eσ
[∥∥
∥∥
∥
∑m
i=1
σixi
∥∥
∥∥
∥ 2
]
=Eσ
∥∥
∥∥
∥
∑m
i=1
σixi
∥∥
∥∥
∥
2
2
1 / 2
≤
E
σ
∥∥
∥∥
∥
∑m
i=1
σixi
∥∥
∥∥
∥
2
2
1 / 2
(26.16).