Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
deal with weighted instances in Section 6.5 under Locally weighted linear regres-
sion(page 252). One way of obtaining probability estimates from support vector
machines is to fit a one-dimensional logistic model to the output, effectively
performing logistic regression as described in Section 4.6 on the output. Excel-
lent results have been reported for text classification using co-EM with the
support vector machine (SVM) classifier. It outperforms other variants of SVM
and seems quite robust to varying proportions of labeled and unlabeled data.
The ideas of co-training and EM—and particularly their combination in the
co-EM algorithm—are interesting, thought provoking, and have striking poten-
tial. But just what makes them work is still controversial and poorly understood.
These techniques are the subject of current research: they have not yet entered
the mainstream of machine learning and been harnessed for practical data
mining.

7.7 Further reading


Attribute selection, under the term feature selection,has been investigated in the
field of pattern recognition for decades. Backward elimination, for example, was
introduced in the early 1960s (Marill and Green 1963). Kittler (1978) surveys
the feature selection algorithms that have been developed for pattern recogni-
tion. Best-first search and genetic algorithms are standard artificial intelligence
techniques (Winston 1992, Goldberg 1989).
The experiments that show the performance of decision tree learners deteri-
orating when new attributes are added are reported by John (1997), who gives
a nice explanation of attribute selection. The idea of finding the smallest attrib-
ute set that carves up the instances uniquely is from Almuallin and Dietterich
(1991, 1992) and was further developed by Liu and Setiono (1996). Kibler and
Aha (1987) and Cardie (1993) both investigated the use of decision tree algo-
rithms to identify features for nearest-neighbor learning; Holmes and Nevill-
Manning (1995) used 1R to order features for selection. Kira and Rendell (1992)
used instance-based methods to select features, leading to a scheme called
RELIEFfor Recursive Elimination of Features.Gilad-Bachrach et al. (2004) show
how this scheme can be modified to work better with redundant attributes. The
correlation-based feature selection method was developed by Hall (2000).
The use of wrapper methods for feature selection is due to John et al. (1994)
and Kohavi and John (1997), and genetic algorithms have been applied within
a wrapper framework by Vafaie and DeJong (1992) and Cherkauer and Shavlik
(1996). The selective Naïve Bayes learning method is due to Langley and Sage
(1994). Guyon et al. (2002) present and evaluate the recursive feature elimina-
tion scheme in conjunction with support vector machines. The method of raced
search was developed by Moore and Lee (1994).

7.7 FURTHER READING 341

Free download pdf