purchases? Should the manager move the most expensive, most profitable
diapers near the beer, increasing sales to harried fathers of a high-margin item
and add further luxury baby products nearby?
Of course, anyone who uses advanced technologies should consider the
wisdom of what they are doing. Ifdatais characterized as recorded facts, then
informationis the set of patterns, or expectations, that underlie the data. You
could go on to define knowledgeas the accumulation of your set of expectations
and wisdomas the value attached to knowledge. Although we will not pursue it
further here, this issue is worth pondering.
As we saw at the very beginning of this chapter, the techniques described in
this book may be called upon to help make some of the most profound and
intimate decisions that life presents. Data mining is a technology that we need
to take seriously.
1.7 Further reading
To avoid breaking up the flow of the main text, all references are collected in a
section at the end of each chapter. This first Further readingsection describes
papers, books, and other resources relevant to the material covered in Chapter
- The human in vitrofertilization research mentioned in the opening to this
chapter was undertaken by the Oxford University Computing Laboratory,
and the research on cow culling was performed in the Computer Science
Department at the University of Waikato, New Zealand.
The example of the weather problem is from Quinlan (1986) and has been
widely used to explain machine learning schemes. The corpus of example prob-
lems mentioned in the introduction to Section 1.2 is available from Blake et al.
(1998). The contact lens example is from Cendrowska (1998), who introduced
the PRISM rule-learning algorithm that we will encounter in Chapter 4. The iris
dataset was described in a classic early paper on statistical inference (Fisher
1936). The labor negotiations data is from the Collective bargaining review,a
publication of Labour Canada issued by the Industrial Relations Information
Service (BLI 1988), and the soybean problem was first described by Michalski
and Chilausky (1980).
Some of the applications in Section 1.3 are covered in an excellent paper that
gives plenty of other applications of machine learning and rule induction
(Langley and Simon 1995); another source of fielded applications is a special
issue of the Machine Learning Journal(Kohavi and Provost 1998). The loan
company application is described in more detail by Michie (1989), the oil slick
detector is from Kubat et al. (1998), the electric load forecasting work is by
Jabbour et al. (1988), and the application to preventative maintenance of
electromechanical devices is from Saitta and Neri (1998). Fuller descriptions
1.7 FURTHER READING 37