Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
14.2 Subgradients 189

f(w)

f(u)

w u

f(

w) +

〈u

−w

,∇

f(

w)


Figure 14.2Left: The right-hand side of Equation (14.7) is the tangent offatw. For
a convex function, the tangent lower boundsf. Right: Illustration of several
subgradients of a nondifferentiable convex function.

14.2.1 Calculating Subgradients


How do we construct subgradients of a given convex function? If a function is
differentiable at a pointw, then the differential set is trivial, as the following
claim shows.
claim14.5 Iffis differentiable atwthen∂f(w)contains a single element –
the gradient offatw,∇f(w).

Example 14.1(The Differential Set of the Absolute Function) Consider the
absolute value functionf(x) =|x|. Using Claim 14.5, we can easily construct
the differential set for the differentiable parts off, and the only point that
requires special attention isx 0 = 0. At that point, it is easy to verify that the
subdifferential is the set of all numbers between−1 and 1. Hence:

∂f(x) =






{ 1 } ifx > 0
{− 1 } ifx < 0
[− 1 ,1] ifx= 0

For many practical uses, we do not need to calculate the whole set of subgra-
dients at a given point, as one member of this set would suffice. The following
claim shows how to construct a sub-gradient for pointwise maximum functions.

claim14.6 Letg(w) = maxi∈[r]gi(w)forrconvex differentiable functions
g 1 ,...,gr. Given somew, letj∈argmaxigi(w). Then∇gj(w)∈∂g(w).
Proof Sincegjis convex we have that for allu
gj(u)≥gj(w) +〈u−w,∇gj(w)〉.
Sinceg(w) =gj(w) andg(u)≥gj(u) we obtain that
g(u)≥g(w) +〈u−w,∇gj(w)〉,

which concludes our proof.
Free download pdf