Understanding Machine Learning: From Theory to Algorithms
14.8 Exercises 201 in the context ofstochastic optimization. See, for example, (Nemirovski & Yudin 1978, Nesterov & Nest ...
15 Support Vector Machines In this chapter and the next we discuss a very useful machine learning tool: the support vector machi ...
15.1 Margin and Hard-SVM 203 x x While both the dashed-black and solid-green hyperplanes separate the four ex- amples, our intui ...
204 Support Vector Machines betweenxanduis at least the distance betweenxandv, which concludes our proof. On the basis of the pr ...
15.1 Margin and Hard-SVM 205 problem given in Equation (15.2). Therefore,‖w 0 ‖≤‖w ? γ?‖= 1 γ?. It follows that for alli, yi(〈wˆ ...
206 Support Vector Machines S′= (αx 1 ,y 1 ),...,(αxm,ym) is separable with a margin ofαγ. That is, a sim- ple scaling of the da ...
15.2 Soft-SVM and Norm Regularization 207 terms is controlled by a parameterλ. This leads to the Soft-SVM optimization problem: ...
208 Support Vector Machines 15.2.1 The Sample Complexity of Soft-SVM We now analyze the sample complexity of Soft-SVM for the ca ...
15.2 Soft-SVM and Norm Regularization 209 examples,ρ, the norm of the halfspaceB(or equivalently the margin parameter γ) and, in ...
210 Support Vector Machines insensitive, and therefore there is no meaning to the norm ofwor its margin when we measure error wi ...
15.4 Duality* 211 lemma15.9 (Fritz John) Suppose that w?∈argmin w f(w) s.t. ∀i∈[m], gi(w)≤ 0 , wheref,g 1 ,...,gm are differenti ...
212 Support Vector Machines problem with respect towis unconstrained and the objective is differentiable; thus, at the optimum, ...
15.6 Summary 213 SGD for Solving Soft-SVM goal:Solve Equation (15.12) parameter:T initialize:θ(1)= 0 fort= 1,...,T Letw(t)=λt^1 ...
214 Support Vector Machines 15.8 Exercises Show that the hard-SVM rule, namely, argmax (w,b):‖w‖=1 min i∈[m] |〈w,xi〉+b| s.t. ∀ ...
16 Kernel Methods In the previous chapter we described the SVM paradigm for learning halfspaces in high dimensional feature spac ...
216 Kernel Methods first define a mappingψ:R→R^2 as follows: ψ(x) = (x,x^2 ). We use the termfeature spaceto denote the range of ...
16.2 The Kernel Trick 217 As before, we can rewritep(x) =〈w,ψ(x)〉where nowψ:Rn→Rdis such that for everyJ∈[n]r,r≤k, the coordinat ...
218 Kernel Methods Xinto a space where these similarities are realized as inner products. It turns out that many learning algori ...
16.2 The Kernel Trick 219 On the basis of the representer theorem we can optimize Equation (16.2) with respect to the coefficien ...
220 Kernel Methods is extremely large while implementing the kernel function is very simple. A few examples are given in the fol ...
«
6
7
8
9
10
11
12
13
14
15
»
Free download pdf