176 Regularization and Stability
DenotefS(w) =LS(w) +λ‖w‖^2 , and based on Lemma 13.5 we know thatfSis
(2λ)-strongly convex. Relying on part 3 of the lemma, it follows that for anyv,
fS(v)−fS(A(S))≥λ‖v−A(S)‖^2. (13.7)
On the other hand, for anyvandu, and for alli, we have
fS(v)−fS(u) =LS(v) +λ‖v‖^2 −(LS(u) +λ‖u‖^2 ) (13.8)
=LS(i)(v) +λ‖v‖^2 −(LS(i)(u) +λ‖u‖^2 )
+
`(v,zi)−`(u,zi)
m
+
`(u,z′)−`(v,z′)
m
In particular, choosingv=A(S(i)),u=A(S), and using the fact thatvmini-
mizesLS(i)(w) +λ‖w‖^2 , we obtain that
fS(A(S(i)))−fS(A(S))≤`(A(S
(i)),zi)−`(A(S),zi)
m
+`(A(S),z
′)−`(A(S(i)),z′)
m
(13.9)
Combining this with Equation (13.7) we obtain that
λ‖A(S(i))−A(S)‖^2 ≤
`(A(S(i)),zi)−`(A(S),zi)
m
+
`(A(S),z′)−`(A(S(i)),z′)
m
(13.10)
The two subsections that follow continue the stability analysis for either Lip-
schitz or smooth loss functions. For both families of loss functions we show that
RLM is stable and therefore it does not overfit.
13.3.1 Lipschitz Loss
If the loss function,`(·,zi), isρ-Lipschitz, then by the definition of Lipschitzness,
`(A(S(i)),zi)−`(A(S),zi)≤ρ‖A(S(i))−A(S)‖. (13.11)
Similarly,
`(A(S),z′)−`(A(S(i)),z′)≤ρ‖A(S(i))−A(S)‖.
Plugging these inequalities into Equation (13.10) we obtain
λ‖A(S(i))−A(S)‖^2 ≤
2 ρ‖A(S(i))−A(S)‖
m
,
which yields
‖A(S(i))−A(S)‖≤^2 ρ
λm
.
Plugging the preceding back into Equation (13.11) we conclude that
`(A(S(i)),zi)−`(A(S),zi)≤
2 ρ^2
λm
.
Since this holds for anyS,z′,iwe immediately obtain: