Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

510 INDEX


CSVLoader, 381
cumulative margin distribution in Weka, 458
curves
cost, 173
lift, 166
recall-precision, 171
ROC, 168
customer support and service, 28
cutoff parameter, 260
CVParameterSelection, 417
cybersecurity, 29

D
dairy farmers (New Zealand), 3–4, 37, 161–162
data assembly, 52–53
data cleaning, 52–60.See alsoautomatic data
cleansing
data engineering.Seeengineering input and
output
data integration, 52
data mining, 4–5, 9
data ownership rights, 35
data preparation, 52–60
data transformation.Seeattribute
transformations
DataVisualizer, 389, 390, 430
data warehouse, 52–53
date attributes, 55
decision list, 11, 67
decision nodes, 328
decision stump, 325
DecisionStump, 407, 453, 454
decision table, 62, 295
DecisionTable, 408
decision tree, 14, 62–65, 97–105
complexity of induction, 196
converting to rules, 198
data cleaning, 312–313
error rates, 192–196
highly branching attributes, 102–105
missing values, 63, 191–192
multiclass case, 107
multivariate, 199
nominal attribute, 62
numeric attribute, 62, 189–191

partial, 207–210
pruning, 192–193, 312
replicated subtree, 66
rules, 198
subtree raising, 193, 197
subtree replacement, 192–193, 197
three-way split, 63
top-down induction, 97–105, 196–198
two-way split, 62
univariate, 199
Weka, 406–408
Weka’s User Classifer facility, 63–65
Decorate, 416
deduction, 350
default rule, 110
degrees of freedom, 93, 155
delta, 311
dendrograms, 82
denormalization, 47
density function, 93
diagnosis, 25–26
dichotomy, 51
directed acyclic graph, 272
direct marketing, 27
discrete attributes, 50.See alsonominal
attributes
Discretize, 396, 398, 402
discretizing numeric attributes, 287, 296–305
chi-squared test, 302
converting discrete to numeric attributes,
304–305
entropy-based discretization, 298–302
error-based discretization, 302–304
global discretization, 297
local discretization, 297
supervised discretization, 297, 298
unsupervised discretization, 297–298
Weka, 398
disjunction, 32, 65
disjunctive normal form, 69
distance functions, 128–129, 239–242
distributed experiments in Weka, 445
distribution, 304
distributionForInstance(), 453, 481
divide-and-conquer.Seedecision tree

P088407-INDEX.qxd 4/30/05 11:25 AM Page 510

Free download pdf