Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
12.1 Convexity, Lipschitzness, and Smoothness 161

Intuitively, a Lipschitz function cannot change too fast. Note that iff:R→R
is differentiable, then by the mean value theorem we have


f(w 1 )−f(w 2 ) =f′(u)(w 1 −w 2 ),

whereuis some point betweenw 1 andw 2. It follows that if the derivative off
is everywhere bounded (in absolute value) byρ, then the function isρ-Lipschitz.


Example 12.4



  • The functionf(x) =|x|is 1-Lipschitz overR. This follows from the triangle
    inequality: For everyx 1 ,x 2 ,


|x 1 |−|x 2 |=|x 1 −x 2 +x 2 |−|x 2 |≤|x 1 −x 2 |+|x 2 |−|x 2 |=|x 1 −x 2 |.

Since this holds for bothx 1 ,x 2 andx 2 ,x 1 , we obtain that||x 1 |−|x 2 || ≤
|x 1 −x 2 |.


  • The functionf(x) = log(1 + exp(x)) is 1-Lipschitz overR. To see this, observe
    that
    |f′(x)|=


∣∣

∣∣ exp(x)
1 + exp(x)

∣∣

∣∣=

∣∣

∣∣^1

exp(−x) + 1

∣∣

∣∣≤ 1.


  • The functionf(x) =x^2 is notρ-Lipschitz overRfor anyρ. To see this, take
    x 1 = 0 andx 2 = 1 +ρ, then


f(x 2 )−f(x 1 ) = (1 +ρ)^2 > ρ(1 +ρ) =ρ|x 2 −x 1 |.

However, this function isρ-Lipschitz over the setC={x:|x| ≤ρ/ 2 }.
Indeed, for anyx 1 ,x 2 ∈Cwe have

|x^21 −x^22 |=|x 1 +x 2 | |x 1 −x 2 |≤2(ρ/2)|x 1 −x 2 |=ρ|x 1 −x 2 |.


  • The linear functionf:Rd→Rdefined byf(w) =〈v,w〉+bwherev∈Rd
    is‖v‖-Lipschitz. Indeed, using Cauchy-Schwartz inequality,


|f(w 1 )−f(w 2 )|=|〈v,w 1 −w 2 〉|≤‖v‖‖w 1 −w 2 ‖.

The following claim shows that composition of Lipschitz functions preserves
Lipschitzness.


claim 12.7 Letf(x) =g 1 (g 2 (x)), whereg 1 isρ 1 -Lipschitz andg 2 isρ 2 -
Lipschitz. Then,fis(ρ 1 ρ 2 )-Lipschitz. In particular, ifg 2 is the linear function,
g 2 (x) =〈v,x〉+b, for somev∈Rd,b∈R, thenfis(ρ 1 ‖v‖)-Lipschitz.


Proof


|f(w 1 )−f(w 2 )|=|g 1 (g 2 (w 1 ))−g 1 (g 2 (w 2 ))|
≤ρ 1 ‖g 2 (w 1 )−g 2 (w 2 )‖
≤ρ 1 ρ 2 ‖w 1 −w 2 ‖.
Free download pdf