176 Regularization and Stability
DenotefS(w) =LS(w) +λ‖w‖^2 , and based on Lemma 13.5 we know thatfSis
(2λ)-strongly convex. Relying on part 3 of the lemma, it follows that for anyv,fS(v)−fS(A(S))≥λ‖v−A(S)‖^2. (13.7)On the other hand, for anyvandu, and for alli, we havefS(v)−fS(u) =LS(v) +λ‖v‖^2 −(LS(u) +λ‖u‖^2 ) (13.8)
=LS(i)(v) +λ‖v‖^2 −(LS(i)(u) +λ‖u‖^2 )+`(v,zi)−`(u,zi)
m+
`(u,z′)−`(v,z′)
m
In particular, choosingv=A(S(i)),u=A(S), and using the fact thatvmini-
mizesLS(i)(w) +λ‖w‖^2 , we obtain thatfS(A(S(i)))−fS(A(S))≤`(A(S(i)),zi)−`(A(S),zi)
m+`(A(S),z′)−`(A(S(i)),z′)
m
(13.9)
Combining this with Equation (13.7) we obtain thatλ‖A(S(i))−A(S)‖^2 ≤`(A(S(i)),zi)−`(A(S),zi)
m+
`(A(S),z′)−`(A(S(i)),z′)
m
(13.10)
The two subsections that follow continue the stability analysis for either Lip-
schitz or smooth loss functions. For both families of loss functions we show that
RLM is stable and therefore it does not overfit.13.3.1 Lipschitz Loss
If the loss function,`(·,zi), isρ-Lipschitz, then by the definition of Lipschitzness,`(A(S(i)),zi)−`(A(S),zi)≤ρ‖A(S(i))−A(S)‖. (13.11)Similarly,
`(A(S),z′)−`(A(S(i)),z′)≤ρ‖A(S(i))−A(S)‖.Plugging these inequalities into Equation (13.10) we obtainλ‖A(S(i))−A(S)‖^2 ≤2 ρ‖A(S(i))−A(S)‖
m,
which yields‖A(S(i))−A(S)‖≤^2 ρ
λm.
Plugging the preceding back into Equation (13.11) we conclude that`(A(S(i)),zi)−`(A(S),zi)≤2 ρ^2
λm.
Since this holds for anyS,z′,iwe immediately obtain: