Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
Now that we’ve seen how the inputs and outputs can be represented, it’s time
to look at the learning algorithms themselves. This chapter explains the basic
ideas behind the techniques that are used in practical data mining. We will not
delve too deeply into the trickier issues—advanced versions of the algorithms,
optimizations that are possible, complications that arise in practice. These topics
are deferred to Chapter 6, where we come to grips with real implementations
of machine learning methods such as the ones included in data mining toolkits
and used for real-world applications. It is important to understand these more
advanced issues so that you know what is really going on when you analyze a
particular dataset.
In this chapter we look at the basic ideas. One of the most instructive lessons
is that simple ideas often work very well, and we strongly recommend the adop-
tion of a “simplicity-first” methodology when analyzing practical datasets. There
are many different kinds of simple structure that datasets can exhibit. In one
dataset, there might be a single attribute that does all the work and the others
may be irrelevant or redundant. In another dataset, the attributes might

chapter 4


Algorithms:


The Basic Methods


83

Free download pdf