212 Support Vector Machines
problem with respect towis unconstrained and the objective is differentiable;
thus, at the optimum, the gradient equals zero:w−∑mi=1αiyixi = 0 ⇒ w=∑mi=1αiyixi.This shows us that the solution must be in the linear span of the examples, a
fact we will use later to derive SVM with kernels. Plugging the preceding into
Equation (15.9) we obtain that the dual problem can be rewritten asα∈Rmaxm:α≥ 0
^1
2
∥∥
∥∥
∥
∑mi=1αiyixi∥∥
∥∥
∥
2
+∑mi=1αi
1 −yi〈
∑
jαjyjxj,xi〉
. (15.10)
Rearranging yields the dual problemα∈Rmaxm:α≥ 0
∑mi=1αi−1
2
∑mi=1∑mj=1αiαjyiyj〈xj,xi〉
. (15.11)
Note that the dual problem only involves inner products between instances and
does not require direct access to specific elements within an instance. This prop-
erty is important when implementing SVM with kernels, as we will discuss in
the next chapter.15.5 Implementing Soft-SVM Using SGD
In this section we describe a very simple algorithm for solving the optimization
problem of Soft-SVM, namely,minw(
λ
2‖w‖^2 +1
m∑mi=1max{ 0 , 1 −y〈w,xi〉})
. (15.12)
We rely on the SGD framework for solving regularized loss minimization prob-
lems, as described in Section 14.5.3.
Recall that, on the basis of Equation (14.15), we can rewrite the update rule
of SGD asw(t+1)=−^1
λt∑tj=1vj,wherevjis a subgradient of the loss function atw(j)on the random example
chosen at iterationj. For the hinge loss, given an example (x,y), we can choosevj
to be 0 ify〈w(j),x〉≥1 andvj=−yxotherwise (see Example 14.2). Denoting
θ(t)=−∑
j<tvjwe obtain the following procedure.