The Art of R Programming

(WallPaper) #1

14.5 Byte Code Compilation.....................................................


Starting with version 2.13, R has included abyte code compiler, which you can
use to try to speed up your code. Consider our example from Section 14.2.1.
As a trivial example, we showed that

z<-x+y

was much faster than

for (i in 1:length(x)) z[i] <- x[i] + y[i]

Again, that was obvious, but just to get an idea of how byte code compilation
works, let’s give it a try:

> library(compiler)
> f <- function() for (i in 1:length(x)) z[i] <<- x[i] + y[i]
> cf <- cmpfun(f)
> system.time(cf())
user system elapsed
0.845 0.003 0.848

We created a new function,cf(), from the originalf(). The new code’s
run time was 0.848 seconds, much faster than the 8.175 seconds the non-
compiled version took. Granted, it still wasn’t as fast as the straightforward
vectorized code, but it is clear that byte code compilation has potential. You
should try it whenever you need faster code.

14.6 Oh No, the Data Doesn’t Fit into Memory!....................................


As mentioned earlier, all objects in an R session are stored in memory. R
places a limit of 231 − 1 bytes on the size of any object, regardless of word
size (32-bit versus 64-bit) and the amount of RAM in your machine. How-
ever, you really should not consider this an obstacle. With a little extra care,
applications that have large memory requirements can indeed be handled
well in R. Some common approaches are chunking and using R packages for
memory management.

14.6.1 Chunking......................................................


One option involving no extra R packages at all is to read in your data from
a disk file one chunk at a time. For example, suppose that our goal is to find
means or proportions of some variables. We can use theskipargument in
read.table().
Say our data set has 1,000,000 records and we divide them into 10
chunks (or more—whatever is needed to cut the data down to a size so it
fits in memory). Then we setskip = 0on our first read, setskip = 100000
the second time, and so on. Each time we read in a chunk, we calculate

320 Chapter 14

Free download pdf