Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
To see why this works, consider the situation after an instance apertaining
to the first class has been added:

This means the output for ahas increased by

This number is always positive. Thus the hyperplane has moved in the correct
direction for classifying instance aas positive. Conversely, if an instance belong-
ing to the second class is misclassified, the output for that instance decreases
after the modification, again moving the hyperplane to the correct direction.
These corrections are incremental and can interfere with earlier updates.
However, it can be shown that the algorithm converges in a finite number of
iterations if the data is linearly separable. Of course, if the data is not linearly
separable, the algorithm will not terminate, so an upper bound needs to be
imposed on the number of iterations when this method is applied in practice.
The resulting hyperplane is called a perceptron,and it’s the grandfather of
neural networks (we return to neural networks in Section 6.3). Figure 4.10(b)
represents the perceptron as a graph with nodes and weighted edges, imagina-
tively termed a “network” of “neurons.” There are two layers of nodes: input and
output. The input layer has one node for every attribute, plus an extra node that
is always set to one. The output layer consists of just one node. Every node in
the input layer is connected to the output layer. The connections are weighted,
and the weights are those numbers found by the perceptron learning rule.
When an instance is presented to the perceptron, its attribute values serve to
“activate” the input layer. They are multiplied by the weights and summed up
at the output node. If the weighted sum is greater than 0 the output signal is 1,
representing the first class; otherwise, it is -1, representing the second.

Linear classification using Winnow


The perceptron algorithm is not the only method that is guaranteed to find a
separating hyperplane for a linearly separable problem. For datasets with binary
attributes there is an alternative known as Winnow,shown in Figure 4.11(a).
The structure of the two algorithms is very similar. Like the perceptron, Winnow
only updates the weight vector when a misclassified instance is encountered—
it is mistake driven.
The two methods differ in how the weights are updated. The perceptron rule
employs an additive mechanism that alters the weight vector by adding (or sub-
tracting) the instance’s attribute vector. Winnow employs multiplicative updates
and alters weights individually by multiplying them by the user-specified
parameter a(or its inverse). The attribute values aiare either 0 or 1 because we

aaaaaa 001122 ¥+¥+¥++¥... aakk.

(waa waa waa000 111 222+ ) ++( ) ++( ) ++ +... (waakkk).

126 CHAPTER 4| ALGORITHMS: THE BASIC METHODS

Free download pdf