The Art of R Programming

(WallPaper) #1

in charging back to numeric at the receiver), but the character form tends
to make for much longer messages, thus longer network transfer time.
Shared-memory systems can be networked together, which, in fact, we
did in the previous example. We had a hybrid situation in which we formed
snowclusters from several networked dual-core computers.


16.4.2 Embarrassingly Parallel Applications and Those That Aren’t.........


It’s no shame to be poor, but it’s no great honor either.
—Tevye,Fiddler on the Roof

Man is the only animal that blushes, or needs to.
—Mark Twain

The termembarrassingly parallelis heard often in talk about parallel R
(and in the parallel processing field in general). The wordembarrassing
alludes to the fact that the problems are so easy to parallelize that there is
no intellectual challenge involved; they are embarrassingly easy.
Both of the example applications we’ve looked at here would be con-
sidered embarrassingly parallel. Parallelizing thefor iloop for the mutual
outlinks problem in Section 16.1 was pretty obvious. Partitioning the work in
the KMC example in Section 16.2.4 was also natural and easy.
By contrast, most parallel sorting algorithms require a great deal of
interaction. For instance, consider merge sort, a common method of sort-
ing numbers. It breaks the vector to be sorted into two (or more) indepen-
dent parts, say the left half and right half, which are then sorted in paral-
lel by two processes. So far, this is embarrassingly parallel, at least after the
vector is divided in half. But then the two sorted halves must be merged
to produce the sorted version of the original vector, and that process is
notembarrassingly parallel. It can be parallelized but in a more complex
manner.
Of course, to paraphrase Tevye, it’s no shame to have an embarrassingly
parallel problem! It may not exactly be an honor, but it is a cause for cele-
bration, as it is easy to program. More important, embarrassingly parallel
problems tend to have low communication overhead, which is crucial to per-
formance, as discussed earlier. In fact, when most people refer to embarrass-
ingly parallel applications, they have this low overhead in mind.
But what about nonembarrassingly parallel applications? Unfortunately,
parallel R code is simply not suitable for many of them for a very basic rea-
son: the functional programming nature of R. As discussed in Section 14.3,
a statement like this:


x[3] <- 8


is deceptively simple, because it can cause the entire vectorxto be rewrit-
ten. This really compounds communication traffic problems. Accordingly,
if your application is not embarrassingly parallel, your best strategy is prob-
ably to write the computationally intensive parts of the code in C, say using
OpenMP or GPU programming.


Parallel R 347
Free download pdf