Understanding Machine Learning: From Theory to Algorithms

12.1 Convexity, Lipschitzness, and Smoothness 161

Intuitively, a Lipschitz function cannot change too fast. Note that iff:R→R
is differentiable, then by the mean value theorem we have

f(w 1 )−f(w 2 ) =f′(u)(w 1 −w 2 ),

whereuis some point betweenw 1 andw 2. It follows that if the derivative off
is everywhere bounded (in absolute value) byρ, then the function isρ-Lipschitz.

Example 12.4

The functionf(x) =|x|is 1-Lipschitz overR. This follows from the triangle
inequality: For everyx 1 ,x 2 ,

|x 1 |−|x 2 |=|x 1 −x 2 +x 2 |−|x 2 |≤|x 1 −x 2 |+|x 2 |−|x 2 |=|x 1 −x 2 |.

Since this holds for bothx 1 ,x 2 andx 2 ,x 1 , we obtain that||x 1 |−|x 2 || ≤ |x 1 −x 2 |.

The functionf(x) = log(1 + exp(x)) is 1-Lipschitz overR. To see this, observe
that
|f′(x)|=

∣∣

∣∣ exp(x) 1 + exp(x)

∣∣

∣∣=

∣∣

∣∣^1

exp(−x) + 1

∣∣

∣∣≤ 1.

The functionf(x) =x^2 is notρ-Lipschitz overRfor anyρ. To see this, take
x 1 = 0 andx 2 = 1 +ρ, then

f(x 2 )−f(x 1 ) = (1 +ρ)^2 > ρ(1 +ρ) =ρ|x 2 −x 1 |.

However, this function isρ-Lipschitz over the setC={x:|x| ≤ρ/ 2 }. Indeed, for anyx 1 ,x 2 ∈Cwe have

|x^21 −x^22 |=|x 1 +x 2 | |x 1 −x 2 |≤2(ρ/2)|x 1 −x 2 |=ρ|x 1 −x 2 |.

The linear functionf:Rd→Rdefined byf(w) =〈v,w〉+bwherev∈Rd
is‖v‖-Lipschitz. Indeed, using Cauchy-Schwartz inequality,

|f(w 1 )−f(w 2 )|=|〈v,w 1 −w 2 〉|≤‖v‖‖w 1 −w 2 ‖.

The following claim shows that composition of Lipschitz functions preserves
Lipschitzness.

claim 12.7 Letf(x) =g 1 (g 2 (x)), whereg 1 isρ 1 -Lipschitz andg 2 isρ 2 -
Lipschitz. Then,fis(ρ 1 ρ 2 )-Lipschitz. In particular, ifg 2 is the linear function,
g 2 (x) =〈v,x〉+b, for somev∈Rd,b∈R, thenfis(ρ 1 ‖v‖)-Lipschitz.

Proof

|f(w 1 )−f(w 2 )|=|g 1 (g 2 (w 1 ))−g 1 (g 2 (w 2 ))| ≤ρ 1 ‖g 2 (w 1 )−g 2 (w 2 )‖ ≤ρ 1 ρ 2 ‖w 1 −w 2 ‖.

Understanding Machine Learning: From Theory to Algorithms

∣∣

∣∣

∣∣=

∣∣

∣∣^1

∣∣

∣∣≤ 1.

Get our desktop app

Company

Features

Documentation

Resources