Understanding Machine Learning: From Theory to Algorithms

14.2 Subgradients 189

f(w)

f(u)

w u

f(

w) +

〈u

−w

,∇

f(

w)

〉

Figure 14.2Left: The right-hand side of Equation (14.7) is the tangent offatw. For a convex function, the tangent lower boundsf. Right: Illustration of several subgradients of a nondifferentiable convex function.

14.2.1 Calculating Subgradients

How do we construct subgradients of a given convex function? If a function is differentiable at a pointw, then the differential set is trivial, as the following claim shows. claim14.5 Iffis differentiable atwthen∂f(w)contains a single element – the gradient offatw,∇f(w).

Example 14.1(The Differential Set of the Absolute Function) Consider the absolute value functionf(x) =|x|. Using Claim 14.5, we can easily construct the differential set for the differentiable parts off, and the only point that requires special attention isx 0 = 0. At that point, it is easy to verify that the subdifferential is the set of all numbers between−1 and 1. Hence:

∂f(x) =







{ 1 } ifx > 0 {− 1 } ifx < 0 [− 1 ,1] ifx= 0

For many practical uses, we do not need to calculate the whole set of subgradients at a given point, as one member of this set would suffice. The following claim shows how to construct a sub-gradient for pointwise maximum functions.

claim14.6 Letg(w) = maxi∈[r]gi(w)forrconvex differentiable functions g 1 ,...,gr. Given somew, letj∈argmaxigi(w). Then∇gj(w)∈∂g(w). Proof Sincegjis convex we have that for allu gj(u)≥gj(w) +〈u−w,∇gj(w)〉. Sinceg(w) =gj(w) andg(u)≥gj(u) we obtain that g(u)≥g(w) +〈u−w,∇gj(w)〉,

which concludes our proof.

Understanding Machine Learning: From Theory to Algorithms

14.2.1 Calculating Subgradients







Get our desktop app

Company

Features

Documentation

Resources