4.6 LINEAR MODELS 125
means that we don’t have to include an additional constant element in the sum.
If the sum is greater than zero, we will predict the first class; otherwise, we will
predict the second class. We want to find values for the weights so that the train-
ing data is correctly classified by the hyperplane.
Figure 4.10(a) gives the perceptron learning rule for finding a separating
hyperplane. The algorithm iterates until a perfect solution has been found, but
it will only work properly if a separating hyperplane exists, that is, if the data is
linearly separable. Each iteration goes through all the training instances. If a
misclassified instance is encountered, the parameters of the hyperplane are
changed so that the misclassified instance moves closer to the hyperplane or
maybe even across the hyperplane onto the correct side. If the instance belongs
to the first class, this is done by adding its attribute values to the weight vector;
otherwise, they are subtracted from it.
Set all weights to zero
Until all instances in the training data are classified correctly
For each instance I in the training data
If I is classified incorrectly by the perceptron
If I belongs to the first class add it to the weight vector
else subtract it from the weight vector
(a)
1
(“bias”)
attribute
a 1
attribute
a 2
attribute
a 3
w 0 w 1 w 2 wk
(b)
Figure 4.10The perceptron: (a) learning rule and (b) representation as a neural network.