Understanding Machine Learning: From Theory to Algorithms

16.5 Bibliographic Remarks 225

16.5 Bibliographic Remarks

In the context of SVM, the kernel-trick has been introduced in Boser et al. (1992). See also Aizerman, Braverman & Rozonoer (1964). The observation that the kernel-trick can be applied whenever an algorithm only relies on inner products was first stated by Sch ̈olkopf, Smola & M ̈uller (1998). The proof of the representer theorem is given in (Sch ̈olkopf, Herbrich, Smola & Williamson 2000, Sch ̈olkopf, Herbrich & Smola 2001). The conditions stated in Lemma 16.2 are simplification of conditions due to Mercer. Many useful kernel functions have been introduced in the literature for various applications. We refer the reader to Sch ̈olkopf & Smola (2002).

16.6 Exercises

Consider the task of finding a sequence of characters in a file, as described
in Section 16.2.1. Show that every member of the classHcan be realized by
composing a linear classifier overψ(x), whose norm is 1 and that attains a
margin of 1.
2.Kernelized Perceptron:Show how to run the Perceptron algorithm while
only accessing the instances via the kernel function.Hint:The derivation is
similar to the derivation of implementing SGD with kernels.
3.Kernel Ridge Regression:The ridge regression problem, with a feature
mappingψ, is the problem of finding a vectorwthat minimizes the function

f(w) =λ‖w‖^2 +

1

2 m

∑m

i=1

(〈w,ψ(xi)〉−yi)^2 , (16.8)

and then returning the predictor

h(x) =〈w,x〉. Show how to implement the ridge regression algorithm with kernels. Hint:The representer theorem tells us that there exists a vectorα∈Rm such that

∑m i=1αiψ(xi) is a minimizer of Equation (16.8).

LetGbe the Gram matrix with regard toS andK. That is,Gij =
K(xi,xj). Defineg:Rm→Rby

g(α) =λ·αTGα+^1 2 m

∑m

i=1

(〈α,G·,i〉−yi)^2 , (16.9)

where G·,i is thei’th column ofG. Show that ifα∗minimizes Equa- tion (16.9) thenw∗=

∑m i=1α i∗ψ(xi) is a minimizer off.

Find a closed form expression forα∗.

LetNbe any positive integer. For everyx,x′∈{ 1 ,...,N}define

K(x,x′) = min{x,x′}.

Understanding Machine Learning: From Theory to Algorithms

16.5 Bibliographic Remarks

16.6 Exercises

1

Get our desktop app

Company

Features

Documentation

Resources