This is useful for initializing sum variables that are shared by the threads,
for instance. As noted earlier, an automatic barrier is placed after the block.
This should make sense to you. If one thread is initializing a sum, you wouldn’t
want other threads that make use of this variable to continue execution until
the sum has been properly set.
You can learn more about OpenMP in my open source textbook on par-
allel processing athttp://heather.cs.ucdavis.edu/parprocbook.
16.3.6 GPU Programming..............................................
Another type of shared-memory parallel hardware consists of graphics
processing units (GPUs). If you have a sophisticated graphics card in your
machine, say for playing games, you may not realize that it is also a very pow-
erful computational device—so powerful that the slogan “A supercomputer
on your desk!” is often used to refer to PCs equipped with high-end GPUs.
As with OpenMP, the idea here is that instead of writing parallel R,
you write R code interfaced to parallel C. (Similar to the OpenMP case,C
here means a slightly augmented version of the C language.) The technical
details become rather complex, so I won’t show any code examples, but an
overview of the platform is worthwhile.
As mentioned, GPUs do follow the shared-memory/threads model,
but on a much larger scale. They have dozens, or even hundreds, of
cores (depending on how you definecore). One major difference is that
several threads can be run together in a block, which can produce certain
efficiencies.
Programs that access GPUs begin their run on your machine’s CPU,
referred to as thehost. They then start code running on the GPU, ordevice.
This means that your data must be transferred from the host to the device,
and after the device finishes its computation, the results must be transferred
back to the host.
As of this writing, GPU has not yet become common among R users.
The most common usage is probably through the CRAN packagegputools,
which consists of some matrix algebra and statistical routines callable from
R. For instance, consider matrix inversion. R provides the functionsolve()
for this, but a parallel alternative is available ingputoolswith the name
gpuSolve().
For more about GPU programming, again see my book on parallel pro-
cessing athttp://heather.cs.ucdavis.edu/parprocbook.
16.4 General Performance Considerations.........................................
This section discusses some issues that you may find generally useful in par-
allelizing R applications. I’ll present some material on the main sources of
overhead and then discuss a couple of algorithmic issues.
Parallel R 345