Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

158 Convex Learning Problems


x

f(x)

An important property of convex functions is that every local minimum of the
function is also a global minimum. Formally, letB(u,r) ={v:‖v−u‖≤r}be
a ball of radiusrcentered aroundu. We say thatf(u) is a local minimum off
atuif there exists somer >0 such that for allv∈B(u,r) we havef(v)≥f(u).
It follows that for anyv(not necessarily inB), there is a small enoughα > 0
such thatu+α(v−u)∈B(u,r) and therefore

f(u)≤f(u+α(v−u)). (12.2)

Iffis convex, we also have that

f(u+α(v−u)) =f(αv+ (1−α)u)≤(1−α)f(u) +αf(v). (12.3)

Combining these two equations and rearranging terms, we conclude thatf(u)≤
f(v). Since this holds for everyv, it follows thatf(u) is also a global minimum
off.
Another important property of convex functions is that for everywwe can
construct a tangent tofatwthat lies belowfeverywhere. Iffis differentiable,
this tangent is the linear functionl(u) =f(w) +〈∇f(w),u−w〉, where∇f(w)
is the gradient of( fatw, namely, the vector of partial derivatives off,∇f(w) =
∂f(w)
∂w 1 ,...,

∂f(w)
∂wd

)

. That is, for convex differentiable functions,


∀u, f(u)≥f(w) +〈∇f(w),u−w〉. (12.4)

In Chapter 14 we will generalize this inequality to nondifferentiable functions.
An illustration of Equation (12.4) is given in the following.
Free download pdf