The Art of R Programming

This is better, as we’ve reduced the number of memory allocations to just two, down from possibly many in the first version of the code. If we really need the speed, we might consider recoding this in C, as dis- cussed in Chapter 14.

2.5.2 Extended Example: Predicting Discrete-Valued Time Series..........

Suppose we observe 0- and 1-valued data, one per time period. To make things concrete, say it’s daily weather data: 1 for rain and 0 for no rain. Sup- pose we wish to predict whether it will rain tomorrow, knowing whether it rained or not in recent days. Specifically, for some numberk, we will predict tomorrow’s weather based on the weather record of the lastkdays. We’ll use majority rule: If the number of 1s in the previousktime periods is at least k/2, we’ll predict the next value to be 1; otherwise, our prediction is 0. For instance, ifk=3and the data for the last three periods is 1,0,1, we’ll predict the next period to be a 1. But how should we choosek? Clearly, if we choose too small a value, it may give us too small a sample from which to predict. Too large a value will cause us to rely on data from the distant past that may have little or no predictive value. A common solution to this problem is to take known data, called atrain- ing set, and then ask how well various values ofkwould have performed on that data. In the weather case, suppose we have 500 days of data and suppose we are considering usingk=3. To assess the predictive ability of that value for k, we “predict” each day in our data from the previous three days and then compare the predictions with the known values. After doing this throughout our data, we have an error rate fork=3. We do the same fork=1,k=2, k=4, and so on, up to some maximum value ofkthat we feel is enough. We then use whichever value ofkworked best in our training data for future predictions. So how would we code that in R? Here’s a naive approach:

1 preda <- function(x,k) {
2 n <- length(x)
3 k2 <- k/2
4 # the vector pred will contain our predicted values
5 pred <- vector(length=n-k)
6 for (i in 1:(n-k)) {
7 if (sum(x[i:(i+(k-1))]) >= k2) pred[i] <- 1 else pred[i] <- 0
8 }
9 return(mean(abs(pred-x[(k+1):n])))
10 }

The heart of the code is line 7. There, we’re predicting dayi+k(prediction to be stored inpred[i]) from thekdays previous to it—that is, days i,...,i+k-1. Thus, we need to count the 1s among those days. Since we’re

Vectors 37

The Art of R Programming

2.5.2 Extended Example: Predicting Discrete-Valued Time Series..........

Get our desktop app

Company

Features

Documentation

Resources