Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
where xis the class;a 1 ,a 2 ,...,akare the attribute values; and w 0 ,w 1 ,...,wkare
weights.
The weights are calculated from the training data. Here the notation gets a
little heavy, because we need a way of expressing the attribute values for each
training instance. The first instance will have a class, say x(1), and attribute values
a 1 (1),a 2 (1),...,ak(1), where the superscript denotes that it is the first example.
Moreover, it is notationally convenient to assume an extra attribute a 0 whose
value is always 1.
The predicted value for the first instance’s class can be written as

This is the predicted, not the actual, value for the first instance’s class. Of inter-
est is the difference between the predicted and the actual values. The method of
linear regression is to choose the coefficients wj—there are k+1 of them—to
minimize the sum of the squares of these differences over all the training
instances. Suppose there are ntraining instances; denote the ith one with a
superscript (i).Then the sum of the squares of the differences is

where the expression inside the parentheses is the difference between the ith
instance’s actual class and its predicted class. This sum of squares is what we
have to minimize by choosing the coefficients appropriately.
This is all starting to look rather formidable. However, the minimization
technique is straightforward if you have the appropriate math background.
Suffice it to say that given enough examples—roughly speaking, more examples
than attributes—choosing weights to minimize the sum of the squared differ-
ences is really not difficult. It does involve a matrix inversion operation, but this
is readily available as prepackaged software.
Once the math has been accomplished, the result is a set of numeric weights,
based on the training data, which we can use to predict the class of new
instances. We saw an example of this when looking at the CPU performance
data, and the actual numeric weights are given in Figure 3.7(a). This formula
can be used to predict the CPU performance of new test instances.
Linear regression is an excellent, simple method for numeric prediction, and
it has been widely used in statistical applications for decades. Of course, linear
models suffer from the disadvantage of, well, linearity. If the data exhibits a non-
linear dependency, the best-fitting straight line will be found, where “best” is
interpreted as the least mean-squared difference. This line may not fit very well.

xwai jji
j

k

i

n () ()

= =

Ê -
ËÁ

ˆ
 1  0 ̄ ̃

2

wa wa wa wakk waj j
j

k
00

1
11

1
22

111
0

() () () () ()
=

++++=... Â.


120 CHAPTER 4| ALGORITHMS: THE BASIC METHODS

Free download pdf