The Art of R Programming

(WallPaper) #1
handles a given value ofidoes an amount of work proportional ton-i. That,
in fact, is why we staggered the values ofiin our actual code: Thread 0 han-
dled theivalues 0, 4, 8 ..., thread 1 worked on 1, 5, 9, ..., and so on, yielding
good load balance.
The point then is that static assignment might require a bit more plan-
ning. One general approach to this is to randomly assign tasks (ivalues, in
our case here) to threads (still doing so at the outset, before work begins).
With a bit of forethought such as this, static assignment should work well in
most applications.

16.4.4 Software Alchemy: Turning General Problems into Embarrassingly
Parallel Ones
As discussed earlier, it’s difficult to attain good performance from non-
embarrassingly parallel algorithms. Fortunately, for statistical applications,
there is a way to turn nonembarrassingly parallel problems into embarrass-
ingly parallel ones. The key is to exploit some statistical properties.
To demonstrate the method, let’s once again turn to our mutual out-
links problem. The method, applied withwworkers on a links matrixm,
consists of the following:


  1. Break the rows ofmintowchunks.

  2. Have each worker find the mean number of mutual outlinks for pairs of
    vertices in its chunk.

  3. Average the results returned by the workers.


It can be shown mathematically that for large problems (the only ones
you would need parallel computing for anyway), this chunked approach
gives the estimators of the same statistical accuracy as in the nonchunked
method. But meanwhile, we’ve turned a nonparallel problem into not just a
parallel one but an embarrassingly parallel one! The workers in the preced-
ing outline compute entirely independently of each other.
This method should not be confused with the usual chunk-based
approaches in parallel processing. In those, such as the merge-sort exam-
ple discussed on page 347, the chunking is embarrassingly parallel, but the
combining of results is not. By contrast, here the combining of results con-
sists of simple averaging, thanks to the mathematical theory.
I tried this approach on the mutual outlinks problem in a 4-workersnow
cluster. This reduced the runtime to 1.5 seconds. This is far better than the
serial time of about 16 seconds, double the speedup obtained by the GPU
and approaching comparability to the OpenMP time. And the theory show-
ing that the two methods give the same statistical accuracy was confirmed as
well. The chunked method found the mean number of mutual outlinks to
be 249.2881, compared to 249.2993 for the original estimator.

350 Chapter 16

Free download pdf