Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

INDEX 511


document classification, 94–96, 352–353
document clustering, 353
domain knowledge, 20, 33, 349–351
double-consequent rules, 118
duplicate data, 59
dynamic programming, 302

E
early stopping, 233
easy instances, 322
ecological applications, 23, 28
eigenvalue, 307
eigenvector, 307
Einstein, Albert, 180
electricity supply, 24–25
electromechanical diagnosis application,
144
11-point average recall, 172
EM, 418
EM algorithm, 265–266
EM and co-training, 340–341
EM procedure, 337–338
embedded machine learning, 461–469
engineering input and output, 285–343
attribute selection, 288–296
combining multiple models, 315–336
data cleansing, 312–315
discretizing numeric attributes, 296–305
unlabeled data, 337–341
See alsoindividual subject headings
entity extraction, 353
entropy, 102
entropy-based discretization, 298–302
enumerated attributes, 50.See alsonominal
attributes
enumerating the concept space, 31–32
Epicurus, 183
epoch, 412
equal-frequency binning, 298
equal-interval binning, 298
equal-width binning, 342
erroneous values, 59
error-based discretization, 302–304
error-correcting output codes, 334–336
error log, 378

error rate
bias, 317
cost of errors.Seecost of errors
decision tree, 192–196
defined, 144
training data, 145
“Essay towards solving a problem in the
doctrine of chances, An” (Bayes), 141
ethics, 35–37
Euclidean distance, 78, 128, 129, 237
evaluation, 143–185
bootstrap procedure, 152–153
comparing data mining methods, 153–157
cost of errors, 161–176.See alsocost of
errors
cross-validation, 149–152
leave-one-out cross-validation, 151–152
MDL principle, 179–184
numeric prediction, 176–179
predicting performance, 146–149
predicting probabilities, 157–161
training and testing, 144–146
evaluation(), 482
evaluation components in Weka, 430, 431
Evaluationpanel, 431
example problems
contact lens data, 6, 13–15
CPU performance data, 16–17
iris dataset, 15–16
labor negotiations data, 17–18, 19
soybean data, 18–22
weather problem, 10–12
exceptions, 70–73, 210–213
exclusive-or problem, 67
exemplar
defined, 236
generalized, 238–239
noisy, 236–237
redundant, 236
exemplar generalization, 238–239, 243
ExhaustiveSearch, 424
Expand all paths, 408
expectation, 265, 267
expected error, 174
expected success rate, 147

P088407-INDEX.qxd 4/30/05 11:25 AM Page 511

Free download pdf