Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

166 Convex Learning Problems


hypothesis class. It is easy to verify thatHis convex. The argument will be
the same as in Example 12.8, except that now the two distributions,D 1 ,D 2 will
be supported onz 1 = (1/μ,0) andz 2 = (1,−1). If the algorithmAreturns
w <ˆ − 1 /2 upon receivingmexamples of the second type, then we will set the
distribution to beD 1 and have that

LD 1 ( ˆw)−minw LD 1 (w)≥μ( ˆw/μ)^2 −LD 1 (0)≥ 1 /(4μ)−(1−μ)> .

Similarly, if ˆw≥− 1 /2 we will set the distribution to beD 2 and have that

LD 2 ( ˆw)−minw LD 2 (w)≥(− 1 /2 + 1)^2 − 0 > .

This example shows that we need additional assumptions on the learning
problem, and this time the solution is in Lipschitzness or smoothness of the
loss function. This motivates a definition of two families of learning problems,
convex-Lipschitz-bounded and convex-smooth-bounded, which are defined later.

12.2.2 Convex-Lipschitz/Smooth-Bounded Learning Problems


definition12.12 (Convex-Lipschitz-Bounded Learning Problem) A learning
problem, (H,Z,`), is called Convex-Lipschitz-Bounded, with parametersρ,Bif
the following holds:


  • The hypothesis classHis a convex set and for allw∈Hwe have‖w‖≤B.

  • For allz∈Z, the loss function,`(·,z), is a convex andρ-Lipschitz function.


Example 12.10 LetX={x∈Rd:‖x‖ ≤ρ}andY=R. LetH={w∈Rd:
‖w‖ ≤B}and let the loss function be`(w,(x,y)) =|〈w,x〉−y|. This corre-
sponds to a regression problem with the absolute-value loss, where we assume
that the instances are in a ball of radiusρand we restrict the hypotheses to be
homogenous linear functions defined by a vectorwwhose norm is bounded by
B. Then, the resulting problem is Convex-Lipschitz-Bounded with parameters
ρ,B.

definition12.13 (Convex-Smooth-Bounded Learning Problem) A learning
problem, (H,Z,`), is called Convex-Smooth-Bounded, with parametersβ,Bif
the following holds:


  • The hypothesis classHis a convex set and for allw∈Hwe have‖w‖≤B.

  • For allz∈Z, the loss function,`(·,z), is a convex, nonnegative, andβ-smooth
    function.


Note that we also required that the loss function is nonnegative. This is needed
to ensure that the loss function is self-bounded, as described in the previous
section.
Free download pdf