27 mysum += procpairs(i,m,nval);
28 }
29 #pragma omp atomic
30 tot += mysum;
31 }
32 int divisor = nval*(nval-1) / 2;
(^33) *mlmean = ((float) tot)/divisor;
34 }
16.3.3 Running the OpenMP Code......................................
Again, compilation follows the recipe in Chapter 15. We do need to link in
the OpenMP library, though, by using the-fopenmpand-lgompoptions. Sup-
pose our source file isromp.c. Then we use the following commands to run
the code:
gcc -std=gnu99 -fopenmp -I/usr/share/R/include -fpic -g -O2 -c romp.c -o romp.o
gcc -std=gnu99 -shared -o romp.so romp.o -L/usr/lib/R/lib -lR -lgomp
Here’s an R test:
> dyn.load("romp.so")
> Sys.setenv(OMP_NUM_THREADS=4)
> n <- 1000
> m <- matrix(sample(0:1,n^2,replace=T),nrow=n)
> system.time(z <- .C("mutlinks",as.integer(m),as.integer(n),result=double(1)))
user system elapsed
0.830 0.000 0.218
> z$result
[1] 249.9471
The typical way to specify the number of threads in OpenMP is through
an operating system environment variable,OMP_NUM_THREADS. R is capable of
setting operating system environment variables with theSys.setenv()func-
tion. Here, I set the number of threads to 4, because I was running on a
quad-core machine.
Note the runtime—only 0.2 seconds! This compares to the 5.0-second
time we saw earlier for a 12-nodesnowsystem. This might be surprising to
some readers, as our code in thesnowversion was vectorized to a fair degree,
as mentioned earlier. Vectorizing is good, but again, R has many hidden
sources of overhead, so C might do even better.
NOTE I tried R’s new byte-compilation functioncmpfun(), butmtl()actually became slower.
Thus, if you are willing to write part of your code in parallel C, dramatic
speedups may be possible.
342 Chapter 16