for (i in 1:(n-k)) {
if (sum(x[i:(i+(k-1))]) >= k2) pred[i] <- 1 else pred[i] <- 0
}
return(mean(abs(pred-x[(k+1):n])))
}
predb <- function(x,k) {
n <- length(x)
k2 <- k/2
pred <- vector(length=n-k)
sm <- sum(x[1:k])
if (sm >= k2) pred[1] <- 1 else pred[1] <- 0
if (n-k >= 2) {
for (i in 2:(n-k)) {
sm <- sm + x[i+k-1] - x[i-1]
if (sm >= k2) pred[i] <- 1 else pred[i] <- 0
}
}
return(mean(abs(pred-x[(k+1):n])))
}
Since the latter avoids duplicate computation, we speculated it would be
faster. Now is the time to check that.
> y <- sample(0:1,100000,replace=T)
> system.time(preda(y,1000))
user system elapsed
3.816 0.016 3.873
> system.time(predb(y,1000))
user system elapsed
1.392 0.008 1.427
Hey, not bad! That’s quite an improvement.
However, you should always ask whether R already has a fine-tuned func-
tion that will suit your needs. Since we’re basically computing a moving aver-
age, we might try thefilter()function, with a constant coefficient vector, as
follows:
predc <- function(x,k) {
n <- length(x)
f <- filter(x,rep(1,k),sides=1)[k:(n-1)]
k2 <- k/2
pred <- as.integer(f >= k2)
return(mean(abs(pred-x[(k+1):n])))
}
328 Chapter 15