Understanding Machine Learning: From Theory to Algorithms

13.3 Tikhonov Regularization as a Stabilizer 177

corollary 13.6 Assume that the loss function is convex andρ-Lipschitz.
Then, the RLM rule with the regularizerλ‖w‖^2 is on-average-replace-one-stable
with rate^2 ρ

2 λm. It follows (using Theorem 13.2) that

S∼DEm[LD(A(S))−LS(A(S))]≤

2 ρ^2 λm

13.3.2 Smooth and Nonnegative Loss

If the loss isβ-smooth and nonnegative then it is also self-bounded (see Sec-
tion 12.1):

‖∇f(w)‖^2 ≤ 2 βf(w). (13.12)

We further assume thatλ≥^2 mβ, or, in other words, thatβ≤λm/2. By the
smoothness assumption we have that

(A(S(i)),zi)−(A(S),zi)≤〈∇`(A(S),zi),A(S(i))−A(S)〉+

β 2

‖A(S(i))−A(S)‖^2.
(13.13)
Using the Cauchy-Schwartz inequality and Equation (12.6) we further obtain
that

`(A(S(i)),zi)−`(A(S),zi)

≤‖∇`(A(S),zi)‖‖A(S(i))−A(S)‖+β 2

‖A(S(i))−A(S)‖^2

≤

√

2 β`(A(S),zi)‖A(S(i))−A(S)‖+

β 2

‖A(S(i))−A(S)‖^2. (13.14)

By a symmetric argument it holds that,

`(A(S),z′)−`(A(S(i)),z′)

≤

√

2 β`(A(S(i)),z′)‖A(S(i))−A(S)‖+

β 2 ‖A(S

(i))−A(S)‖ (^2).
Plugging these inequalities into Equation (13.10) and rearranging terms we ob-
tain that
‖A(S(i))−A(S)‖≤

√

2 β (λm−β)

(√

`(A(S),zi) +

√

`(A(S(i)),z′)

)

.

Combining the preceding with the assumptionβ≤λm/2 yields

‖A(S(i))−A(S)‖≤

√

8 β λm

(√

`(A(S),zi) +

√

`(A(S(i)),z′)

)

.

Understanding Machine Learning: From Theory to Algorithms

13.3.2 Smooth and Nonnegative Loss

√

√

√

(√

√

)

.

√

(√

√

)

.

Get our desktop app

Company

Features

Documentation

Resources