warnings thatsplit()would give us (“data length is not a multiple of split
variable”) by callingoptions().
The real work is done in line 23, where we call thesnowfunction
clusterApply(). This function initiates a call to the same specified func-
tion (mtl()here), with some arguments specific to each worker and some
optional arguments common to all. So, here’s what the call in line 23 does:
- Worker 1 will be directed to call the functionmtl()with the arguments
ichunks[[1]]andm. - Worker 2 will callmtl()with the argumentsichunks[[2]]andm, and so on
for all workers. - Each worker will perform its assigned task and then return the result to
the manager. - The manager will collect all such results into an R list, which we have
assigned here tocounts.
At this point, we merely need to sum all the elements ofcounts. Well, I
shouldn’t say “merely,” because there is a little wrinkle to iron out in line 24.
R’ssum()function is capable of acting on several vector arguments,
like this:
sum(1:2,c(4,10))
[1] 17
But here,countsis an R list, not a (numeric) vector. So we rely on
do.call()to extract the vectors fromcounts, and then we callsum()on them.
Note lines 9 and 10. As you know, in R, we try to vectorize our computa-
tion wherever possible for better performance. By casting things in matrix-
times-vector terms, we replace thefor jandfor kloops in the outline in
Section 16.1 by a single vector-based expression.
16.2.3 How Much Speedup Can Be Attained?...........................
I tried this code on a 1000-by-1000 matrixm1000. I first ran it on a 4-worker
cluster and then on a 12-worker cluster. In principle, I should have had
speedups of 4 and 12, respectively. But the actual elapsed times were 6.2 sec-
onds and 5.0 seconds. Compare these figures to the 16.9 seconds runtime
in nonparallel form. (The latter consisted of the callmtl(1:1000,m1000).) So,
I attained a speedup of about 2.7 instead of a theoretical 4.0 for a 4-worker
cluster and 3.4 rather than 12.0 on the 12-node system. (Note that some tim-
ing variation occurs from run to run.) What went wrong?
In almost any parallel-processing application, you encounteroverhead,or
“wasted” time spent on noncomputational activity. In our example, there is
overhead in the form of the time needed to send our matrix from the man-
ager to the workers. We also encountered a bit of overhead in sending the
functionmtl()itself to the workers. And when the workers finish their tasks,
returning their results to the manager causes some overhead, too. We’ll
Parallel R 337