CONTENTS xv
- 1.4 Machine learning and statistics
- 1.5 Generalization as search
- Enumerating the concept space
- Bias
- 1.6 Data mining and ethics
- 1.7 Further reading
- 2 Input: Concepts, instances, and attributes
- 2.1 What’s a concept?
- 2.2 What’s in an example?
- 2.3 What’s in an attribute?
- 2.4 Preparing the input
- Gathering the data together
- ARFF format
- Sparse data
- Attribute types
- Missing values
- Inaccurate values
- Getting to know your data
- 2.5 Further reading
- 3 Output: Knowledge representation
- 3.1 Decision tables
- 3.2 Decision trees
- 3.3 Classification rules
- 3.4 Association rules
- 3.5 Rules with exceptions
- 3.6 Rules involving relations
- 3.7 Trees for numeric prediction
- 3.8 Instance-based representation
- 3.9 Clusters
- 3.10 Further reading
- 4 Algorithms: The basic methods
- 4.1 Inferring rudimentary rules
- Missing values and numeric attributes
- Discussion
- 4.2 Statistical modeling
- Missing values and numeric attributes
- Bayesian models for document classification
- Discussion
- 4.3 Divide-and-conquer: Constructing decision trees
- Calculating information
- Highly branching attributes
- Discussion
- 4.4 Covering algorithms: Constructing rules
- Rules versus trees
- A simple covering algorithm
- Rules versus decision lists
- 4.5 Mining association rules
- Item sets
- Association rules
- Generating rules efficiently
- Discussion
- 4.6 Linear models
- Numeric prediction: Linear regression
- Linear classification: Logistic regression
- Linear classification using the perceptron
- Linear classification using Winnow
- 4.7 Instance-based learning
- The distance function
- Finding nearest neighbors efficiently
- Discussion
- 4.8 Clustering
- Iterative distance-based clustering
- Faster distance calculations
- Discussion
- 4.9 Further reading
- 5 Credibility: Evaluating what’s been learned
- 5.1 Training and testing
- 5.2 Predicting performance
- 5.3 Cross-validation
- 5.4 Other estimates
- Leave-one-out
- The bootstrap
- 5.5 Comparing data mining methods
- 5.6 Predicting probabilities
- Quadratic loss function
- Informational loss function
- Discussion
- 5.7 Counting the cost
- Cost-sensitive classification
- Cost-sensitive learning
- Lift charts
- ROC curves
- Recall–precision curves
- Discussion
- Cost curves
- 5.8 Evaluating numeric prediction
- 5.9 The minimum description length principle
- 5.10 Applying the MDL principle to clustering
- 5.11 Further reading
- 6 Implementations: Real machine learning schemes
- 6.1 Decision trees
- Numeric attributes
- Missing values
- Pruning
- Estimating error rates
- Complexity of decision tree induction
- From trees to rules
- C4.5: Choices and options
- Discussion
- 6.2 Classification rules
- Criteria for choosing tests
- Missing values, numeric attributes
- Generating good rules
- Using global optimization
- Obtaining rules from partial decision trees
- Rules with exceptions
- Discussion
- 6.3 Extending linear models
- The maximum margin hyperplane
- Nonlinear class boundaries
- Support vector regression
- The kernel perceptron
- Multilayer perceptrons
- Discussion
- 6.4 Instance-based learning
- Reducing the number of exemplars
- Pruning noisy exemplars
- Weighting attributes
- Generalizing exemplars
- Distance functions for generalized exemplars
- Generalized distance functions
- Discussion
- 6.5 Numeric prediction
- Model trees
- Building the tree
- Pruning the tree
- Nominal attributes
- Missing values
- Pseudocode for model tree induction
- Rules from model trees
- Locally weighted linear regression
- Discussion
- 6.6 Clustering
- Choosing the number of clusters
- Incremental clustering
- Category utility
- Probability-based clustering
- The EM algorithm
- Extending the mixture model
- Bayesian clustering
- Discussion
- 6.7 Bayesian networks
- Making predictions
- Learning Bayesian networks
- Specific algorithms
- Data structures for fast learning
- Discussion
- 7 Transformations: Engineering the input and output
- 7.1 Attribute selection
- Scheme-independent selection
- Searching the attribute space
- Scheme-specific selection
- 7.2 Discretizing numeric attributes
- Unsupervised discretization
- Entropy-based discretization
- Other discretization methods
- Entropy-based versus error-based discretization
- Converting discrete to numeric attributes
- 7.3 Some useful transformations
- Principal components analysis
- Random projections
- Text to attribute vectors
- Time series
- 7.4 Automatic data cleansing
- Improving decision trees
- Robust regression
- Detecting anomalies
- 7.5 Combining multiple models
- Bagging
- Bagging with costs
- Randomization
- Boosting
- Additive regression
- Additive logistic regression
- Option trees
- Logistic model trees
- Stacking
- Error-correcting output codes
- 7.6 Using unlabeled data
- Clustering for classification
- Co-training
- EM and co-training
- 7.7 Further reading
- 8 Moving on: Extensions and applications
- 8.1 Learning from massive datasets
- 8.2 Incorporating domain knowledge
- 8.3 Text and Web mining
- 8.4 Adversarial situations
- 8.5 Ubiquitous data mining
- 8.6 Further reading
- Part II The Weka machine learning workbench
- 9 Introduction to Weka
- 9.1 What’s in Weka?
- 9.2 How do you use it?
- 9.3 What else can you do?
- 9.4 How do you get it?
- 10 The Explorer
- 10.1 Getting started
- Preparing the data
- Loading the data into the Explorer
- Building a decision tree
- Examining the output
- Doing it again
- Working with models
- When things go wrong
- 10.2 Exploring the Explorer
- Loading and filtering files
- Training and testing learning schemes
- Do it yourself: The User Classifier
- Using a metalearner
- Clustering and association rules
- Attribute selection
- Visualization
- 10.3 Filtering algorithms
- Unsupervised attribute filters
- Unsupervised instance filters
- Supervised filters
- 10.4 Learning algorithms
- Bayesian classifiers
- Trees
- Rules
- Functions
- Lazy classifiers
- Miscellaneous classifiers
- 10.5 Metalearning algorithms
- Bagging and randomization
- Boosting
- Combining classifiers
- Cost-sensitive learning
- Optimizing performance
- Retargeting classifiers for different tasks
- 10.6 Clustering algorithms
- 10.7 Association-rule learners
- 10.8 Attribute selection
- Attribute subset evaluators
- Single-attribute evaluators
- Search methods
- 11 The Knowledge Flow interface
- 11.1 Getting started
- 11.2 The Knowledge Flow components
- 11.3 Configuring and connecting the components
- 11.4 Incremental learning
- 12 The Experimenter
- 12.1 Getting started
- Running an experiment
- Analyzing the results
- 12.2 Simple setup
- 12.3 Advanced setup
- 12.4 The Analyze panel
- 12.5 Distributing processing over several machines
- 13 The command-line interface
- 13.1 Getting started
- 13.2 The structure of Weka
- Classes, instances, and packages
- The weka.core package
- The weka.classifiers package
- Other packages
- Javadoc indices
- 13.3 Command-line options
- Generic options
- Scheme-specific options
- 14 Embedded machine learning
- 14.1 A simple data mining application
- 14.2 Going through the code
- main()
- MessageClassifier()
- updateData()
- classifyMessage()
- 15 Writing new learning schemes
- 15.1 An example classifier
- buildClassifier()
- makeTree()
- computeInfoGain()
- classifyInstance()
- main()
- 15.2 Conventions for implementing classifiers
- References
- Index
- About the authors