Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
may be associated with the rules themselves to indicate that some are more
important, or more reliable, than others.
You might be wondering whether there is a smaller rule set that performs as
well. If so, would you be better off using the smaller rule set and, if so, why?
These are exactly the kinds of questions that will occupy us in this book. Because
the examples form a complete set for the problem space, the rules do no more
than summarize all the information that is given, expressing it in a different and
more concise way. Even though it involves no generalization, this is often a very
useful thing to do! People frequently use machine learning techniques to gain
insight into the structure of their data rather than to make predictions for new
cases. In fact, a prominent and successful line of research in machine learning
began as an attempt to compress a huge database of possible chess endgames
and their outcomes into a data structure of reasonable size. The data structure
chosen for this enterprise was not a set of rules but a decision tree.
Figure 1.2 shows a structural description for the contact lens data in the form
of a decision tree, which for many purposes is a more concise and perspicuous
representation of the rules and has the advantage that it can be visualized more
easily. (However, this decision tree—in contrast to the rule set given in Figure
1.1—classifies two examples incorrectly.) The tree calls first for a test on tear
production rate,and the first two branches correspond to the two possible out-
comes. Iftear production rateis reduced(the left branch), the outcome is none.
If it is normal(the right branch), a second test is made, this time on astigma-
tism.Eventually, whatever the outcome of the tests, a leaf of the tree is reached

14 CHAPTER 1| WHAT’S IT ALL ABOUT?


normal

tear production rate

reduced

myope hypermetrope

none astigmatism

soft

hard none

spectacle prescription

no yes

Figure 1.2Decision tree for the
contact lens data.
Free download pdf