Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

514 INDEX


implementation—real-world schemes
(continued)
numeric prediction, 243–254
See alsoindividual subject headings
inaccurate values, 59–60.See alsocost of errors;
data cleaning; error rate
incremental algorithms, 346
incrementalClassifier, 434
IncrementalClassifierEvaluator, 431
incremental clustering, 255–260
incremental learning in Weka, 433–435
incremental reduced-error pruning, 203, 205
independent attributes, 267
index(), 472
induction, 29
inductive logic programming, 48, 60, 75, 351
Induct system, 214
industrial usage.Seeimplementation—real-
world schemes
inferring rudimentary rules, 84–88
InfoGainAttributeEval, 422–423
informational loss function, 159–160, 161
information-based heuristic, 201
information extraction, 354
information gain, 99
information retrieval, 171
information value, 102
infrequent words, 353
inner cross-validation, 286
input, 41–60
ARFF format, 53–55
assembling the data, 52–53
attribute, 49–52
attribute types, 56–57
concept, 42–45
data engineering, 286–287, 288–315.See also
engineering input and output
data preparation, 52–60
getting to know your data, 60
inaccurate values, 59–60
instances, 45
missing values, 58
sparse data, 55–56
input layer, 224
instance in Weka, 450

Instance, 451
instance-based learning, 78, 128–136, 235–243
ball tree, 133–135
distance functions, 128–129, 239–242
finding nearest neighbors, 129–135
generalized distance functions, 241–242
generalized exemplars, 236
kD-trees, 130–132
missing values, 129
pruning noisy exemplars, 236–237
redundant exemplars, 236
simple method, 128–136, 235–236
weighting attributes, 237–238
Weka, 413–414
instance-based learning methods, 291
instance-based methods, 34
instance-based representation, 76–80
instance filters in Weka, 394, 400–401, 403
instances, 45
Instances, 451
instance space, 79
instance weights, 166, 321–322
integer-valued attributes, 49
intensive care patients, 29
interval, 88
interval quantities, 50–51
intrusion detection systems, 357
invertSelection, 382
in vitrofertilization, 3
iris dataset, 15–16
iris setosa,15
iris versicolor,15
iris virginica,15
ISO-8601 combined date and time format, 55
item, 113
item sets, 113, 114–115
iterative distance-based clustering, 137–138

J
J4.8, 373–377
J48, 404, 450
Javadoc indices, 456
JDBC database, 445
JRip, 409
junk email filtering, 356–357

P088407-INDEX.qxd 4/30/05 11:25 AM Page 514

Free download pdf