Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

where xis the class;a 1 ,a 2 ,...,akare the attribute values; and w 0 ,w 1 ,...,wkare weights. The weights are calculated from the training data. Here the notation gets a little heavy, because we need a way of expressing the attribute values for each training instance. The first instance will have a class, say x(1), and attribute values a 1 (1),a 2 (1),...,ak(1), where the superscript denotes that it is the first example. Moreover, it is notationally convenient to assume an extra attribute a 0 whose value is always 1. The predicted value for the first instance’s class can be written as

This is the predicted, not the actual, value for the first instance’s class. Of inter- est is the difference between the predicted and the actual values. The method of linear regression is to choose the coefficients wj—there are k+1 of them—to minimize the sum of the squares of these differences over all the training instances. Suppose there are ntraining instances; denote the ith one with a superscript (i).Then the sum of the squares of the differences is

where the expression inside the parentheses is the difference between the ith instance’s actual class and its predicted class. This sum of squares is what we have to minimize by choosing the coefficients appropriately. This is all starting to look rather formidable. However, the minimization technique is straightforward if you have the appropriate math background. Suffice it to say that given enough examples—roughly speaking, more examples than attributes—choosing weights to minimize the sum of the squared differences is really not difficult. It does involve a matrix inversion operation, but this is readily available as prepackaged software. Once the math has been accomplished, the result is a set of numeric weights, based on the training data, which we can use to predict the class of new instances. We saw an example of this when looking at the CPU performance data, and the actual numeric weights are given in Figure 3.7(a). This formula can be used to predict the CPU performance of new test instances. Linear regression is an excellent, simple method for numeric prediction, and it has been widely used in statistical applications for decades. Of course, linear models suffer from the disadvantage of, well, linearity. If the data exhibits a non- linear dependency, the best-fitting straight line will be found, where “best” is interpreted as the least mean-squared difference. This line may not fit very well.

xwai jji j

k

i

n () ()

= =

Ê - ËÁ

ˆ Â 1 Â 0 ̄ ̃

2

wa wa wa wakk waj j j

k 00

1 11

1 22

111 0

() () () () () =

++++=... Â.

120 CHAPTER 4| ALGORITHMS: THE BASIC METHODS

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Get our desktop app

Company

Features

Documentation

Resources