The Art of R Programming

(WallPaper) #1

warnings thatsplit()would give us (“data length is not a multiple of split
variable”) by callingoptions().
The real work is done in line 23, where we call thesnowfunction
clusterApply(). This function initiates a call to the same specified func-
tion (mtl()here), with some arguments specific to each worker and some
optional arguments common to all. So, here’s what the call in line 23 does:



  1. Worker 1 will be directed to call the functionmtl()with the arguments
    ichunks[[1]]andm.

  2. Worker 2 will callmtl()with the argumentsichunks[[2]]andm, and so on
    for all workers.

  3. Each worker will perform its assigned task and then return the result to
    the manager.

  4. The manager will collect all such results into an R list, which we have
    assigned here tocounts.


At this point, we merely need to sum all the elements ofcounts. Well, I
shouldn’t say “merely,” because there is a little wrinkle to iron out in line 24.
R’ssum()function is capable of acting on several vector arguments,
like this:



sum(1:2,c(4,10))
[1] 17



But here,countsis an R list, not a (numeric) vector. So we rely on
do.call()to extract the vectors fromcounts, and then we callsum()on them.
Note lines 9 and 10. As you know, in R, we try to vectorize our computa-
tion wherever possible for better performance. By casting things in matrix-
times-vector terms, we replace thefor jandfor kloops in the outline in
Section 16.1 by a single vector-based expression.


16.2.3 How Much Speedup Can Be Attained?...........................


I tried this code on a 1000-by-1000 matrixm1000. I first ran it on a 4-worker
cluster and then on a 12-worker cluster. In principle, I should have had
speedups of 4 and 12, respectively. But the actual elapsed times were 6.2 sec-
onds and 5.0 seconds. Compare these figures to the 16.9 seconds runtime
in nonparallel form. (The latter consisted of the callmtl(1:1000,m1000).) So,
I attained a speedup of about 2.7 instead of a theoretical 4.0 for a 4-worker
cluster and 3.4 rather than 12.0 on the 12-node system. (Note that some tim-
ing variation occurs from run to run.) What went wrong?
In almost any parallel-processing application, you encounteroverhead,or
“wasted” time spent on noncomputational activity. In our example, there is
overhead in the form of the time needed to send our matrix from the man-
ager to the workers. We also encountered a bit of overhead in sending the
functionmtl()itself to the workers. And when the workers finish their tasks,
returning their results to the manager causes some overhead, too. We’ll


Parallel R 337
Free download pdf