Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
Because this is a linear equality in the attribute values, the boundary is a linear
plane, or hyperplane,in instance space. It is easy to visualize sets of points that
cannot be separated by a single hyperplane, and these cannot be discriminated
correctly by logistic regression.
Multiresponse linear regression suffers from the same problem. Each class
receives a weight vector calculated from the training data. Focus for the moment
on a particular pair of classes. Suppose the weight vector for class 1 is

and the same for class 2 with appropriate superscripts. Then, an instance will
be assigned to class 1 rather than class 2 if

In other words, it will be assigned to class 1 if

This is a linear inequality in the attribute values, so the boundary between each
pair of classes is a hyperplane. The same holds true when performing pairwise
classification. The only difference is that the boundary between two classes is
governed by the training instances in those classes and is not influenced by the
other classes.

Linear classification using the perceptron


Logistic regression attempts to produce accurate probability estimates by max-
imizing the probability of the training data. Of course, accurate probability esti-
mates lead to accurate classifications. However, it is not necessary to perform
probability estimation if the sole purpose of the model is to predict class labels.
A different approach is to learn a hyperplane that separates the instances per-
taining to the different classes—let’s assume that there are only two of them. If
the data can be separated perfectly into two groups using a hyperplane, it is said
to be linearly separable.It turns out that if the data is linearly separable, there
is a very simple algorithm for finding a separating hyperplane.
The algorithm is called the perceptron learning rule.Before looking at it in
detail, let’s examine the equation for a hyperplane again:

Here,a 1 ,a 2 ,...,akare the attribute values, and w 0 ,w 1 ,...,wkare the weights
that define the hyperplane. We will assume that each training instance a 1 ,a 2 ,

... is extended by an additional attribute a 0 that always has the value 1 (as we
did in the case of linear regression). This extension, which is called the bias,just


wa wa wa 00 ++++ = 11 22 ... wakk 0.

(ww wwa( 01 )- 0 (^2 ))+-( 1 (^1 ) 1 (^2 )) 1 ++ -... (wwakkk(^12 ) ()) > 0.

w wa 0 (^1 )+++>+++ 1 (^1 ) 1 ... wa wkk(^1 ) 0 (^2 ) wa 1 (^2 ) 1 ... wak k(^2 )

wwawa 0 (^1 )++++ 1 (^1 ) 12 (^1 ) 2 ... wakk(^1 )

-- -- =wwa 011 ... wakk 0.

124 CHAPTER 4| ALGORITHMS: THE BASIC METHODS

Free download pdf