Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

5.6 PREDICTING PROBABILITIES 157

In practice there is usually only a single dataset of limited size. What can be done? We could split the data into (perhaps 10) subsets and perform a cross- validation on each. However, the overall result will only tell us whether a learning scheme is preferable for that particular size—perhaps one-tenth of the original dataset. Alternatively, the original dataset could be reused—for example, with different randomizations of the dataset for each cross-validation.^2 However, the resulting cross-validation estimates will not be independent because they are not based on independent datasets. In practice, this means that a difference may be judged to be significant when in fact it is not. In fact, just increasing the number of samples k,that is, the number of cross-validation runs, will eventually yield an apparently significant difference because the value of the t-statistic increases without bound. Various modifications of the standard t-test have been proposed to circum- vent this problem, all of them heuristic and lacking sound theoretical justifica- tion. One that appears to work well in practice is the corrected resampled t-test. Assume for the moment that the repeated holdout method is used instead of cross-validation, repeated ktimes on different random splits of the same dataset to obtain accuracy estimates for two learning methods. Each time,n 1 instances are used for training and n 2 for testing, and differences diare computed from performance on the test data. The corrected resampled t-test uses the modified statistic

in exactly the same way as the standard t-statistic. A closer look at the formula shows that its value cannot be increased simply by increasing k.The same modified statistic can be used with repeated cross-validation, which is just a special case of repeated holdout in which the individual test sets for one cross- validation do not overlap. For 10-fold cross-validation repeated 10 times, k=100,n 2 /n 1 =0.1/0.9, and s^2 dis based on 100 differences.

5.6 Predicting probabilities

Throughout this section we have tacitly assumed that the goal is to maximize the success rate of the predictions. The outcome for each test instance is either correct, if the prediction agrees with the actual value for that instance, or incor- rect,if it does not. There are no grays: everything is black or white, correct or

t

d

k

n n d

= ÊË + ˆ ̄

(^12)
1
s^2
(^2) The method was advocated in the first edition of this book.

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

5.6 Predicting probabilities

Get our desktop app

Company

Features

Documentation

Resources