straint, and all training instances become support vectors. Conversely, ifeis
large enough that the tube can enclose all the data, the error becomes zero, there
is no tradeoff to make, and the algorithm outputs the flattest tube that encloses
the data irrespective of the value ofC.
The kernel perceptron
In Section 4.6 we introduced the perceptron algorithm for learning a linear clas-
sifier. It turns out that the kernel trick can also be used to upgrade this algo-
rithm to learn nonlinear decision boundaries. To see this, we first revisit the
linear case. The perceptron algorithm repeatedly iterates through the training
data instance by instance and updates the weight vector every time one of these
instances is misclassified based on the weights learned so far. The weight vector
is updated simply by adding or subtracting the instance’s attribute values to or
from it. This means that the final weight vector is just the sum of the instances
that have been misclassified. The perceptron makes its predictions based on
whether
is greater or less than zero—where wiis the weight for the ith attribute and ai
the corresponding attribute value of the instance that we wish to classify.
Instead, we could use
Here,a¢(j) is the jth misclassified training instance,a¢(j)iis its ith attribute value,
and y(j) is its class value (either +1 or -1). To implement this we no longer keep
track of an explicit weight vector: we simply store the instances misclassified so
far and use the preceding expression to make a prediction.
It looks like we’ve gained nothing—in fact, the algorithm is much slower
because it iterates through all misclassified training instances every time a pre-
diction is made. However, closer inspection of this formula reveals that it can
be expressed in terms of dot products between instances. First, swap the sum-
mation signs to yield
The second sum is just a dot product between two instances and can be written
as
Âjyj()aa.¢(j)◊
Âjyj()Âia j a¢()i i.
ÂiÂjyja j a()¢()i i.
Âiwaii
222 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES