split early on using a particular attribute might turn out later to be ill consid-
ered in light of how the tree develops below that node. To get around these prob-
lems, a beam searchcould be used in which irrevocable commitments are not
made but instead a set of several active alternatives—whose number is the beam
width—are pursued in parallel. This will complicate the learning algorithm
quite considerably but has the potential to avoid the myopia associated with a
greedy search. Of course, if the beam width is not large enough, myopia may
still occur. There are more complex search strategies that help to overcome this
problem.
A more general and higher-level kind of search bias concerns whether the
search is done by starting with a general description and refining it, or by
starting with a specific example and generalizing it. The former is called a
general-to-specificsearch bias; the latter a specific-to-generalone. Many learning
algorithms adopt the former policy, starting with an empty decision tree, or a
very general rule, and specializing it to fit the examples. However, it is perfectly
possible to work in the other direction. Instance-based methods start with a
particular example and see how it can be generalized to cover nearby examples
in the same class.
Overfitting-avoidance bias
Overfitting-avoidance bias is often just another kind of search bias. But because
it addresses a rather special problem, we treat it separately. Recall the disjunc-
tion problem described previously. The problem is that if disjunction is allowed,
useless concept descriptions that merely summarize the data become possible,
whereas if it is prohibited, some concepts are unlearnable. To get around this
problem, it is common to search the concept space starting with the simplest
concept descriptions and proceeding to more complex ones: simplest-first
ordering. This biases the search toward simple concept descriptions.
Using a simplest-first search and stopping when a sufficiently complex
concept description is found is a good way of avoiding overfitting. It is some-
times called forward pruningor prepruningbecause complex descriptions are
pruned away before they are reached. The alternative,backward pruningor post-
pruning,is also viable. Here, we first find a description that fits the data well and
then prune it back to a simpler description that also fits the data. This is not as
redundant as it sounds: often the only way to arrive at a simple theory is to find
a complex one and then simplify it. Forward and backward pruning are both a
kind of overfitting-avoidance bias.
In summary, although generalization as search is a nice way to think about
the learning problem, bias is the only way to make it feasible in practice. Dif-
ferent learning algorithms correspond to different concept description spaces
searched with different biases. This is what makes it interesting: different
34 CHAPTER 1| WHAT’S IT ALL ABOUT?