Understanding Machine Learning: From Theory to Algorithms

158 Convex Learning Problems

x

f(x)

An important property of convex functions is that every local minimum of the function is also a global minimum. Formally, letB(u,r) ={v:‖v−u‖≤r}be a ball of radiusrcentered aroundu. We say thatf(u) is a local minimum off atuif there exists somer >0 such that for allv∈B(u,r) we havef(v)≥f(u). It follows that for anyv(not necessarily inB), there is a small enoughα > 0 such thatu+α(v−u)∈B(u,r) and therefore

f(u)≤f(u+α(v−u)). (12.2)

Iffis convex, we also have that

f(u+α(v−u)) =f(αv+ (1−α)u)≤(1−α)f(u) +αf(v). (12.3)

Combining these two equations and rearranging terms, we conclude thatf(u)≤ f(v). Since this holds for everyv, it follows thatf(u) is also a global minimum off. Another important property of convex functions is that for everywwe can construct a tangent tofatwthat lies belowfeverywhere. Iffis differentiable, this tangent is the linear functionl(u) =f(w) +〈∇f(w),u−w〉, where∇f(w) is the gradient of( fatw, namely, the vector of partial derivatives off,∇f(w) = ∂f(w) ∂w 1 ,...,

∂f(w) ∂wd

)

. That is, for convex differentiable functions,

∀u, f(u)≥f(w) +〈∇f(w),u−w〉. (12.4)

In Chapter 14 we will generalize this inequality to nondifferentiable functions. An illustration of Equation (12.4) is given in the following.

Understanding Machine Learning: From Theory to Algorithms

)

Get our desktop app

Company

Features

Documentation

Resources