Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
16.5 Bibliographic Remarks 225

16.5 Bibliographic Remarks


In the context of SVM, the kernel-trick has been introduced in Boser et al. (1992).
See also Aizerman, Braverman & Rozonoer (1964). The observation that the
kernel-trick can be applied whenever an algorithm only relies on inner products
was first stated by Sch ̈olkopf, Smola & M ̈uller (1998). The proof of the representer
theorem is given in (Sch ̈olkopf, Herbrich, Smola & Williamson 2000, Sch ̈olkopf,
Herbrich & Smola 2001). The conditions stated in Lemma 16.2 are simplification
of conditions due to Mercer. Many useful kernel functions have been introduced
in the literature for various applications. We refer the reader to Sch ̈olkopf &
Smola (2002).

16.6 Exercises



  1. Consider the task of finding a sequence of characters in a file, as described
    in Section 16.2.1. Show that every member of the classHcan be realized by
    composing a linear classifier overψ(x), whose norm is 1 and that attains a
    margin of 1.
    2.Kernelized Perceptron:Show how to run the Perceptron algorithm while
    only accessing the instances via the kernel function.Hint:The derivation is
    similar to the derivation of implementing SGD with kernels.
    3.Kernel Ridge Regression:The ridge regression problem, with a feature
    mappingψ, is the problem of finding a vectorwthat minimizes the function


f(w) =λ‖w‖^2 +

1

2 m

∑m

i=1

(〈w,ψ(xi)〉−yi)^2 , (16.8)

and then returning the predictor

h(x) =〈w,x〉.
Show how to implement the ridge regression algorithm with kernels.
Hint:The representer theorem tells us that there exists a vectorα∈Rm
such that

∑m
i=1αiψ(xi) is a minimizer of Equation (16.8).


  1. LetGbe the Gram matrix with regard toS andK. That is,Gij =
    K(xi,xj). Defineg:Rm→Rby


g(α) =λ·αTGα+^1
2 m

∑m

i=1

(〈α,G·,i〉−yi)^2 , (16.9)

where G·,i is thei’th column ofG. Show that ifα∗minimizes Equa-
tion (16.9) thenw∗=

∑m
i=1α
i∗ψ(xi) is a minimizer off.


  1. Find a closed form expression forα∗.

  2. LetNbe any positive integer. For everyx,x′∈{ 1 ,...,N}define


K(x,x′) = min{x,x′}.
Free download pdf