Understanding Machine Learning: From Theory to Algorithms

166 Convex Learning Problems

hypothesis class. It is easy to verify thatHis convex. The argument will be the same as in Example 12.8, except that now the two distributions,D 1 ,D 2 will be supported onz 1 = (1/μ,0) andz 2 = (1,−1). If the algorithmAreturns w <ˆ − 1 /2 upon receivingmexamples of the second type, then we will set the distribution to beD 1 and have that

LD 1 ( ˆw)−minw LD 1 (w)≥μ( ˆw/μ)^2 −LD 1 (0)≥ 1 /(4μ)−(1−μ)> .

Similarly, if ˆw≥− 1 /2 we will set the distribution to beD 2 and have that

LD 2 ( ˆw)−minw LD 2 (w)≥(− 1 /2 + 1)^2 − 0 > .

This example shows that we need additional assumptions on the learning problem, and this time the solution is in Lipschitzness or smoothness of the loss function. This motivates a definition of two families of learning problems, convex-Lipschitz-bounded and convex-smooth-bounded, which are defined later.

12.2.2 Convex-Lipschitz/Smooth-Bounded Learning Problems

definition12.12 (Convex-Lipschitz-Bounded Learning Problem) A learning problem, (H,Z,`), is called Convex-Lipschitz-Bounded, with parametersρ,Bif the following holds:

The hypothesis classHis a convex set and for allw∈Hwe have‖w‖≤B.

For allz∈Z, the loss function,`(·,z), is a convex andρ-Lipschitz function.

Example 12.10 LetX={x∈Rd:‖x‖ ≤ρ}andY=R. LetH={w∈Rd: ‖w‖ ≤B}and let the loss function be`(w,(x,y)) =|〈w,x〉−y|. This corre- sponds to a regression problem with the absolute-value loss, where we assume that the instances are in a ball of radiusρand we restrict the hypotheses to be homogenous linear functions defined by a vectorwwhose norm is bounded by B. Then, the resulting problem is Convex-Lipschitz-Bounded with parameters ρ,B.

definition12.13 (Convex-Smooth-Bounded Learning Problem) A learning problem, (H,Z,`), is called Convex-Smooth-Bounded, with parametersβ,Bif the following holds:

The hypothesis classHis a convex set and for allw∈Hwe have‖w‖≤B.

For allz∈Z, the loss function,`(·,z), is a convex, nonnegative, andβ-smooth
function.

Note that we also required that the loss function is nonnegative. This is needed to ensure that the loss function is self-bounded, as described in the previous section.

Understanding Machine Learning: From Theory to Algorithms

12.2.2 Convex-Lipschitz/Smooth-Bounded Learning Problems

Get our desktop app

Company

Features

Documentation

Resources