Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

To see why this works, consider the situation after an instance apertaining to the first class has been added:

This means the output for ahas increased by

This number is always positive. Thus the hyperplane has moved in the correct direction for classifying instance aas positive. Conversely, if an instance belong- ing to the second class is misclassified, the output for that instance decreases after the modification, again moving the hyperplane to the correct direction. These corrections are incremental and can interfere with earlier updates. However, it can be shown that the algorithm converges in a finite number of iterations if the data is linearly separable. Of course, if the data is not linearly separable, the algorithm will not terminate, so an upper bound needs to be imposed on the number of iterations when this method is applied in practice. The resulting hyperplane is called a perceptron,and it’s the grandfather of neural networks (we return to neural networks in Section 6.3). Figure 4.10(b) represents the perceptron as a graph with nodes and weighted edges, imagina- tively termed a “network” of “neurons.” There are two layers of nodes: input and output. The input layer has one node for every attribute, plus an extra node that is always set to one. The output layer consists of just one node. Every node in the input layer is connected to the output layer. The connections are weighted, and the weights are those numbers found by the perceptron learning rule. When an instance is presented to the perceptron, its attribute values serve to “activate” the input layer. They are multiplied by the weights and summed up at the output node. If the weighted sum is greater than 0 the output signal is 1, representing the first class; otherwise, it is -1, representing the second.

Linear classification using Winnow

The perceptron algorithm is not the only method that is guaranteed to find a separating hyperplane for a linearly separable problem. For datasets with binary attributes there is an alternative known as Winnow,shown in Figure 4.11(a). The structure of the two algorithms is very similar. Like the perceptron, Winnow only updates the weight vector when a misclassified instance is encountered— it is mistake driven. The two methods differ in how the weights are updated. The perceptron rule employs an additive mechanism that alters the weight vector by adding (or sub- tracting) the instance’s attribute vector. Winnow employs multiplicative updates and alters weights individually by multiplying them by the user-specified parameter a(or its inverse). The attribute values aiare either 0 or 1 because we

aaaaaa 001122 ¥+¥+¥++¥... aakk.

(waa waa waa000 111 222+ ) ++( ) ++( ) ++ +... (waakkk).

126 CHAPTER 4| ALGORITHMS: THE BASIC METHODS

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Linear classification using Winnow

Get our desktop app

Company

Features

Documentation

Resources