Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
13.3 Tikhonov Regularization as a Stabilizer 177

corollary 13.6 Assume that the loss function is convex andρ-Lipschitz.
Then, the RLM rule with the regularizerλ‖w‖^2 is on-average-replace-one-stable
with rate^2 ρ


2
λm. It follows (using Theorem 13.2) that

S∼DEm[LD(A(S))−LS(A(S))]≤

2 ρ^2
λm

13.3.2 Smooth and Nonnegative Loss


If the loss isβ-smooth and nonnegative then it is also self-bounded (see Sec-
tion 12.1):


‖∇f(w)‖^2 ≤ 2 βf(w). (13.12)

We further assume thatλ≥^2 mβ, or, in other words, thatβ≤λm/2. By the
smoothness assumption we have that


(A(S(i)),zi)−(A(S),zi)≤〈∇`(A(S),zi),A(S(i))−A(S)〉+


β
2

‖A(S(i))−A(S)‖^2.
(13.13)
Using the Cauchy-Schwartz inequality and Equation (12.6) we further obtain
that


`(A(S(i)),zi)−`(A(S),zi)

≤‖∇`(A(S),zi)‖‖A(S(i))−A(S)‖+β
2

‖A(S(i))−A(S)‖^2



2 β`(A(S),zi)‖A(S(i))−A(S)‖+

β
2

‖A(S(i))−A(S)‖^2.
(13.14)

By a symmetric argument it holds that,


`(A(S),z′)−`(A(S(i)),z′)



2 β`(A(S(i)),z′)‖A(S(i))−A(S)‖+

β
2 ‖A(S

(i))−A(S)‖ (^2).
Plugging these inequalities into Equation (13.10) and rearranging terms we ob-
tain that
‖A(S(i))−A(S)‖≤



2 β
(λm−β)

(√

`(A(S),zi) +


`(A(S(i)),z′)

)

.

Combining the preceding with the assumptionβ≤λm/2 yields


‖A(S(i))−A(S)‖≤


8 β
λm

(√

`(A(S),zi) +


`(A(S(i)),z′)

)

.
Free download pdf