Data Mining: Practical Machine Learning Tools and Techniques, Second Edition
to rule induction and continued with association rules, linear models, the nearest-neighbor method of instance-based learning, a ...
Because of the nature of the material it contains, this chapter differs from the others in the book. Sections can be read indepe ...
It is common to place numeric thresholds halfway between the values that delimit the boundaries of a concept, although something ...
whereas successive splits on a numeric attribute may continue to yield new information. Whereas a nominal attribute can only be ...
split at lower nodes, of course, if the values of other attributes are unknown as well. Pruning When we looked at the labor nego ...
which accounts for the entire subtree in Figure 1.3(a) being replaced by a single leaf marked bad. Finally, consideration would ...
It is no use taking the training set error as the error estimate: that would not lead to any pruning because the tree has been c ...
The mathematics involved is just the same as before. Given a particular con- fidence c(the default figure used by C4.5 is c =25% ...
196 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES the error estimate for the working hoursnode, so the subtree is pr ...
ing that most of the instances are different from each other, and—this is almost the same thing—that the mattributes provide eno ...
Taking into account all these operations, the full complexity of decision tree induction is From trees to rules It is possible t ...
The more recent version, C5.0, is available commercially. Its decision tree induc- tion seems to be essentially the same as that ...
6.2 Classification rules We call the basic covering algorithm for generating rules that was described in Section 4.4 a separate- ...
6.2 CLASSIFICATION RULES 201 number of instances that satisfied the rule beforethe new test was added. The rationale for this is ...
deferred until most of the other instances have been taken care of, at which time tests will probably emerge that involve other ...
6.2 CLASSIFICATION RULES 203 performs better than the original rule. This pruning process repeats until the rule cannot be impro ...
and this quantity, evaluated on the test set, has been used to evaluate the success of a rule when using reduced-error pruning. ...
6.2 CLASSIFICATION RULES 205 Using global optimization In general, rules generated using incremental reduced-error pruning in th ...
206 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES (a) Initialize E to the instance set For each class C, from smalle ...
6.2 CLASSIFICATION RULES 207 description length, it replaces the rule. Next we reactivate the original building phase to mop up ...
«
7
8
9
10
11
12
13
14
15
16
»
Free download pdf