Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

160 Convex Learning Problems



  • Given somex ∈Rdandy ∈R, letf :Rd →Rbe defined asf(w) =
    (〈w,x〉−y)^2. Then,fis a composition of the functiong(a) =a^2 onto a
    linear function, and hencefis a convex function.

  • Given somex∈Rdandy∈ {± 1 }, letf:Rd→Rbe defined asf(w) =
    log(1 + exp(−y〈w,x〉)). Then,fis a composition of the functiong(a) =
    log(1 + exp(a)) onto a linear function, and hencefis a convex function.
    Finally, the following lemma shows that the maximum of convex functions is
    convex and that a weighted sum of convex functions, with nonnegative weights,
    is also convex.


claim12.5 Fori = 1,...,r, letfi: Rd →Rbe a convex function. The
following functions fromRdtoRare also convex.


  • g(x) = maxi∈[r]fi(x)

  • g(x) =


∑r
i=1wifi(x), where for alli,wi≥^0.
Proof The first claim follows by
g(αu+ (1−α)v) = maxi fi(αu+ (1−α)v)

≤max
i

[αfi(u) + (1−α)fi(v)]

≤αmaxi fi(u) + (1−α) maxi fi(v)

=αg(u) + (1−α)g(v).

For the second claim
g(αu+ (1−α)v) =


i

wifi(αu+ (1−α)v)



i

wi[αfi(u) + (1−α)fi(v)]



i

wifi(u) + (1−α)


i

wifi(v)

=αg(u) + (1−α)g(v).

Example 12.3 The functiong(x) =|x|is convex. To see this, note thatg(x) =
max{x,−x}and that both the functionf 1 (x) =xandf 2 (x) =−xare convex.

12.1.2 Lipschitzness


The definition of Lipschitzness below is with respect to the Euclidean norm over
Rd. However, it is possible to define Lipschitzness with respect to any norm.

definition12.6 (Lipschitzness) LetC⊂Rd. A functionf :Rd →Rk is
ρ-Lipschitz overCif for everyw 1 ,w 2 ∈Cwe have that‖f(w 1 )−f(w 2 )‖ ≤
ρ‖w 1 −w 2 ‖.
Free download pdf