Understanding Machine Learning: From Theory to Algorithms

176 Regularization and Stability

DenotefS(w) =LS(w) +λ‖w‖^2 , and based on Lemma 13.5 we know thatfSis (2λ)-strongly convex. Relying on part 3 of the lemma, it follows that for anyv,

fS(v)−fS(A(S))≥λ‖v−A(S)‖^2. (13.7)

On the other hand, for anyvandu, and for alli, we have

fS(v)−fS(u) =LS(v) +λ‖v‖^2 −(LS(u) +λ‖u‖^2 ) (13.8) =LS(i)(v) +λ‖v‖^2 −(LS(i)(u) +λ‖u‖^2 )

+

`(v,zi)−`(u,zi) m

+

`(u,z′)−`(v,z′) m In particular, choosingv=A(S(i)),u=A(S), and using the fact thatvmini- mizesLS(i)(w) +λ‖w‖^2 , we obtain that

fS(A(S(i)))−fS(A(S))≤`(A(S

(i)),zi)−`(A(S),zi) m

+`(A(S),z

′)−`(A(S(i)),z′) m (13.9) Combining this with Equation (13.7) we obtain that

λ‖A(S(i))−A(S)‖^2 ≤

`(A(S(i)),zi)−`(A(S),zi) m

+

`(A(S),z′)−`(A(S(i)),z′) m (13.10) The two subsections that follow continue the stability analysis for either Lip- schitz or smooth loss functions. For both families of loss functions we show that RLM is stable and therefore it does not overfit.

13.3.1 Lipschitz Loss

If the loss function,`(·,zi), isρ-Lipschitz, then by the definition of Lipschitzness,

`(A(S(i)),zi)−`(A(S),zi)≤ρ‖A(S(i))−A(S)‖. (13.11)

Similarly, `(A(S),z′)−`(A(S(i)),z′)≤ρ‖A(S(i))−A(S)‖.

Plugging these inequalities into Equation (13.10) we obtain

λ‖A(S(i))−A(S)‖^2 ≤

2 ρ‖A(S(i))−A(S)‖ m

,

which yields

‖A(S(i))−A(S)‖≤^2 ρ λm

.

Plugging the preceding back into Equation (13.11) we conclude that

`(A(S(i)),zi)−`(A(S),zi)≤

2 ρ^2 λm

.

Since this holds for anyS,z′,iwe immediately obtain:

Understanding Machine Learning: From Theory to Algorithms

+

+

13.3.1 Lipschitz Loss

,

.

.

Get our desktop app

Company

Features

Documentation

Resources