Understanding Machine Learning: From Theory to Algorithms
16.2 The Kernel Trick 221 simple. More generally, given a scalarσ >0, the Gaussian kernel is defined to be K(x,x′) =e− ‖x−x′‖ ...
222 Kernel Methods complexity that is polynomial ind. However, the dimension of the feature space is exponential indso a direct ...
16.3 Implementing Soft-SVM with Kernels 223 directly tackles the Soft-SVM optimization problem in the feature space, minw ( λ 2 ...
224 Kernel Methods space. By the definition ofα(t)=λt^1 β(t)andw(t)=λt^1 θ(t), this claim implies that Equation (16.7) also hold ...
16.5 Bibliographic Remarks 225 16.5 Bibliographic Remarks In the context of SVM, the kernel-trick has been introduced in Boser e ...
226 Kernel Methods Prove thatKis a valid kernel; namely, find a mappingψ:{ 1 ,...,N} →H whereHis some Hilbert space, such that ∀ ...
17 Multiclass, Ranking, and Complex Prediction Problems Multiclass categorization is the problem of classifying instances into o ...
228 Multiclass, Ranking, and Complex Prediction Problems sifiers, each of which discriminates between one class and the rest of ...
17.1 One-versus-All and All-Pairs 229 All-Pairs input: training setS= (x 1 ,y 1 ),...,(xm,ym) algorithm for binary classificatio ...
230 Multiclass, Ranking, and Complex Prediction Problems that even though the approximation error of the class of predictors of ...
17.2 Linear Multiclass Predictors 231 Chapter 16 and as we will discuss in more detail in Chapter 25). Two examples of useful co ...
232 Multiclass, Ranking, and Complex Prediction Problems short. Intuitively, Ψj(x, y) should be large if the word corresponding ...
17.2 Linear Multiclass Predictors 233 loss functions (see Section 12.3). In particular, we generalize the hinge loss to multicla ...
234 Multiclass, Ranking, and Complex Prediction Problems For eachy′ 6 =y, the difference between〈w,Ψ(x,y)〉and〈w,Ψ(x,y′)〉is larg ...
17.2 Linear Multiclass Predictors 235 Consider running Multiclass SVM withλ= √ 2 ρ^2 B^2 mon a training setS∼ D m and lethwbe th ...
236 Multiclass, Ranking, and Complex Prediction Problems 17.3 Structured Output Prediction Structured output prediction problems ...
17.3 Structured Output Prediction 237 words (i.e., sequences of letters) inY. We define the function ∆(y′,y) to be the average n ...
238 Multiclass, Ranking, and Complex Prediction Problems while the feature function Ψi,j, 2 can be written in terms of φi,j, 2 ( ...
17.4 Ranking 239 X of arbitrary length. A ranking hypothesis,h, is a function that receives a sequence of instancesx ̄= (x 1 ,.. ...
240 Multiclass, Ranking, and Complex Prediction Problems We can easily see that ∆(y′,y)∈[0,1] and that ∆(y′,y) = 0 whenever π(y′ ...
«
7
8
9
10
11
12
13
14
15
16
»
Free download pdf