Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Because this is a linear equality in the attribute values, the boundary is a linear plane, or hyperplane,in instance space. It is easy to visualize sets of points that cannot be separated by a single hyperplane, and these cannot be discriminated correctly by logistic regression. Multiresponse linear regression suffers from the same problem. Each class receives a weight vector calculated from the training data. Focus for the moment on a particular pair of classes. Suppose the weight vector for class 1 is

and the same for class 2 with appropriate superscripts. Then, an instance will be assigned to class 1 rather than class 2 if

In other words, it will be assigned to class 1 if

This is a linear inequality in the attribute values, so the boundary between each pair of classes is a hyperplane. The same holds true when performing pairwise classification. The only difference is that the boundary between two classes is governed by the training instances in those classes and is not influenced by the other classes.

Linear classification using the perceptron

Logistic regression attempts to produce accurate probability estimates by max- imizing the probability of the training data. Of course, accurate probability estimates lead to accurate classifications. However, it is not necessary to perform probability estimation if the sole purpose of the model is to predict class labels. A different approach is to learn a hyperplane that separates the instances per- taining to the different classes—let’s assume that there are only two of them. If the data can be separated perfectly into two groups using a hyperplane, it is said to be linearly separable.It turns out that if the data is linearly separable, there is a very simple algorithm for finding a separating hyperplane. The algorithm is called the perceptron learning rule.Before looking at it in detail, let’s examine the equation for a hyperplane again:

Here,a 1 ,a 2 ,...,akare the attribute values, and w 0 ,w 1 ,...,wkare the weights that define the hyperplane. We will assume that each training instance a 1 ,a 2 ,

... is extended by an additional attribute a 0 that always has the value 1 (as we
did in the case of linear regression). This extension, which is called the bias,just

wa wa wa 00 ++++ = 11 22 ... wakk 0.

(ww wwa( 01 )- 0 (^2 ))+-( 1 (^1 ) 1 (^2 )) 1 ++ -... (wwakkk(^12 ) ()) > 0.

w wa 0 (^1 )+++>+++ 1 (^1 ) 1 ... wa wkk(^1 ) 0 (^2 ) wa 1 (^2 ) 1 ... wak k(^2 )

wwawa 0 (^1 )++++ 1 (^1 ) 12 (^1 ) 2 ... wakk(^1 )

-- -- =wwa 011 ... wakk 0.

124 CHAPTER 4| ALGORITHMS: THE BASIC METHODS

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Linear classification using the perceptron

Get our desktop app

Company

Features

Documentation

Resources