outliers, and so on. Most learning algorithms use statistical tests when con-
structing rules or trees and for correcting models that are “overfitted” in that
they depend too strongly on the details of the particular examples used to
produce them (we have already seen an example of this in the two decision trees
of Figure 1.3 for the labor negotiations problem). Statistical tests are used to
validate machine learning models and to evaluate machine learning algorithms.
In our study of practical techniques for data mining, we will learn a great deal
about statistics.
1.5 Generalization as search
One way of visualizing the problem of learning—and one that distinguishes it
from statistical approaches—is to imagine a search through a space of possible
concept descriptions for one that fits the data. Although the idea of generaliza-
tion as search is a powerful conceptual tool for thinking about machine learn-
ing, it is not essential for understanding the practical methods described in this
book. That is why this section is marked optional,as indicated by the gray bar
in the margin.
Suppose, for definiteness, that concepts—the result of learning—are
expressed as rules such as the ones given for the weather problem in Section 1.2
(although other concept description languages would do just as well). Suppose
that we list all possible sets of rules and then look for ones that satisfy a given
set of examples. A big job? Yes. An infinitejob? At first glance it seems so because
there is no limit to the number of rules there might be. But actually the number
of possible rule sets is finite. Note first that each individual rule is no greater
than a fixed maximum size, with at most one term for each attribute: for the
weather data of Table 1.2 this involves four terms in all. Because the number of
possible rules is finite, the number of possible rule setsis finite, too, although
extremely large. However, we’d hardly be interested in sets that contained a very
large number of rules. In fact, we’d hardly be interested in sets that had more
rules than there are examples because it is difficult to imagine needing more
than one rule for each example. So if we were to restrict consideration to rule
sets smaller than that, the problem would be substantially reduced, although
still very large.
The threat of an infinite number of possible concept descriptions seems more
serious for the second version of the weather problem in Table 1.3 because these
rules contain numbers. If they are real numbers, you can’t enumerate them, even
in principle. However, on reflection the problem again disappears because the
numbers really just represent breakpoints in the numeric values that appear in
the examples. For instance, consider the temperatureattribute in Table 1.3. It
involves the numbers 64, 65, 68, 69, 70, 71, 72, 75, 80, 81, 83, and 85—12 dif-
30 CHAPTER 1| WHAT’S IT ALL ABOUT?