Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

and discard the rest. In this case, three principal components account for 84% of the variance in the dataset; seven account for ...

7.3 SOME USEFUL TRANSFORMATIONS 309 outcome of principal components analysis, and it is common practice to stan- dardize all att ...

capture any internal structure of the string or bring out any interesting aspects of the text it represents. You could imagine d ...

7.3 SOME USEFUL TRANSFORMATIONS 311 the frequencies fijfor word i in document j can be transformed in various stan- dard ways. O ...

7.4 Automatic data cleansing A problem that plagues practical data mining is poor quality of the data. Errors in large databases ...

7.4 AUTOMATIC DATA CLEANSING 313 Interestingly enough, it has been shown that when artificial noise is added to attributes (rath ...

remarkably unperturbed. This line has a simple and natural interpretation. Geo- metrically, it corresponds to finding the narrow ...

7.5 COMBINING MULTIPLE MODELS 315 dataset. The suspicion will remain that perhaps the new dataset is simply unsuited to decision ...

learning techniques do this by learning an ensemble of models and using them in combination: prominent among these are schemes c ...

7.5 COMBINING MULTIPLE MODELS 317 particularly if the training datasets are fairly small. This is a rather disturbing fact and s ...

and variance: this is the bias–variance decomposition.^4 Combining multiple classifiers decreases the expected error by reducing ...

7.5 COMBINING MULTIPLE MODELS 319 reduces the expected value of the mean-squared error. (As we mentioned earlier, the analogous ...

training instance the prediction that minimizes the expected cost, based on the probability estimates obtained from bagging. Met ...

7.5 COMBINING MULTIPLE MODELS 321 Randomization demands more work than bagging because the learning algo- rithm must be modified ...

weight. Such instances become particularly important because there is a greater incentive to classify them correctly. The C4.5 a ...

7.5 COMBINING MULTIPLE MODELS 323 How much should the weights be altered after each iteration? The answer depends on the current ...

A disadvantage of this procedure is that some instances with low weight don’t make it into the resampled dataset, so information ...

7.5 COMBINING MULTIPLE MODELS 325 The beautiful thing about boosting is that a powerful combined classifier can be built from ve ...

these errors by learning a second model—perhaps another regression tree—that tries to predict the observed residuals. To do this ...

7.5 COMBINING MULTIPLE MODELS 327 of iterations needed to arrive at a good additive model. Reducing the multiplier effectively d ...

«
13
14
15
16
17
18
19
20
21
22
»

Free download pdf

Get our desktop app

Company

Features

Documentation

Resources