Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
If outlook =rainy and temperature =mild and humidity =high
and windy =false then play = yes
If outlook =rainy and temperature =cool and humidity =normal
and windy =false then play = yes
If outlook =overcast and temperature =cool and humidity =normal
and windy =true then play = yes
...
If none of the above then play =no

This is not a particularly enlightening concept description: it simply records the
positive examples that have been observed and assumes that all the rest are neg-
ative. Each positive example is given its own rule, and the concept is the dis-
junction of the rules. Alternatively, you could imagine having individual rules
for each of the negative examples, too—an equally uninteresting concept. In
either case the concept description does not perform any generalization; it
simply records the original data.
On the other hand, if disjunction is notallowed, some possible concepts—
sets of examples—may not be able to be represented at all. In that case, a
machine learning scheme may simply be unable to achieve good performance.
Another kind of language bias is that obtained from knowledge of the par-
ticular domain being used. For example, it may be that some combinations of
attribute values can never happen. This would be the case if one attribute
implied another. We saw an example of this when considering the rules for the
soybean problem described on page 20. Then, it would be pointless to even con-
sider concepts that involved redundant or impossible combinations of attribute
values. Domain knowledge can be used to cut down the search space. Knowl-
edge is power: a little goes a long way, and even a small hint can reduce the
search space dramatically.


Search bias
In realistic data mining problems, there are many alternative concept descrip-
tions that fit the data, and the problem is to find the “best” one according to
some criterion—usually simplicity. We use the term fitin a statistical sense; we
seek the best description that fits the data reasonably well. Moreover, it is often
computationally infeasible to search the whole space and guarantee that the
description found really is the best. Consequently, the search procedure is
heuristic, and no guarantees can be made about the optimality of the final result.
This leaves plenty of room for bias: different search heuristics bias the search in
different ways.
For example, a learning algorithm might adopt a “greedy” search for rules by
trying to find the best rule at each stage and adding it in to the rule set. However,
it may be that the best pairof rules is not just the two rules that are individu-
ally found to be the best. Or when building a decision tree, a commitment to


1.5 GENERALIZATION AS SEARCH 33

Free download pdf