508 INDEX
causal relations, 350
CfsSubsetEval, 422
chain rule (probability theory), 275
character sets, 310
ChiSquaredAttributeEval, 423
chi-squared test, 302
circular ordering, 51
city-block metric, 129
ClassAssigner, 431
class attribute, 43
class distribution, 304
class hierarchy in Weka, 471–483
classification, 121
Classification And Regression Tree (CART), 29,
38, 199, 253
classification learning, 43
classification problems, 42
classification rules, 65–69, 200–214
converting decision trees to, 198
criteria for choosing tests, 200–201
decision list, 11
different from association rules, 42
global optimization, 205–207
good (worthwhile) rules, 202–205
missing values, 201–202
numeric attributes, 202
partial decision tree, 207–210
pruning, 203, 205
replicated subtree problem, 66–68
RIPPER rule learner, 205
rules with exceptions, 210–213
Weka, 408–409
Classifier, 453
classifier in Weka, 366, 471–483
classifier algorithms.See learning algorithms
classifier errors in Weka, 379
ClassifierPerformanceEvaluator, 431
classifierspackage, 453–455
ClassifierSubsetEval, 422
Classifiersuperclass, 480
classifyInstance(), 453, 480–481
Classifypanel, 373, 384
classify text files into two categories, 461–469
Classit, 271
class noise, 313
ClassOrder, 403
class summary, 451
ClassValuePicker, 431
cleaning data, 52, 60
automatic, 312
closed world assumption, 45
ClustererPerformanceEvaluator, 431
clustering, 43, 136–139, 254–271
anomalies, 258
basic assumption, 270
Bayesian, 268–270
category utility, 260–262
Clusterpanel (Weka), 391–392
document, 353
EM algorithm, 265–266
faster distance calculations, 138–139
hierarchical, 139
how many clusters?, 254–255
incremental, 255–260
k-means, 137–138
MDL principle, 183–184
merging, 257
mixture model, 262–264, 266–268
output, 81–82
probability-based, 262–265
RBF network, 234
splitting, 254–255, 257
unlabeled data, 337–339
clustering algorithms in Weka, 418–419
clustering for classification, 337
ClusterMembership, 396, 397
Clusterpanel, 391–392
Cobweb, 271
Cobweb in Weka, 419
co-EM, 340
column separation, 336
combining multiple models, 287, 315–336
additive logistic regression, 327–328
additive regression, 325–327
bagging, 316–319
bagging with costs, 319–320
boosting, 321–325
error-correcting output codes, 334–336
logistic model trees, 331
option trees, 328–331
P088407-INDEX.qxd 4/30/05 11:25 AM Page 508