The Art of R Programming

handles a given value ofidoes an amount of work proportional ton-i. That, in fact, is why we staggered the values ofiin our actual code: Thread 0 han- dled theivalues 0, 4, 8 ..., thread 1 worked on 1, 5, 9, ..., and so on, yielding good load balance. The point then is that static assignment might require a bit more plan- ning. One general approach to this is to randomly assign tasks (ivalues, in our case here) to threads (still doing so at the outset, before work begins). With a bit of forethought such as this, static assignment should work well in most applications.

16.4.4 Software Alchemy: Turning General Problems into Embarrassingly Parallel Ones As discussed earlier, it’s difficult to attain good performance from nonembarrassingly parallel algorithms. Fortunately, for statistical applications, there is a way to turn nonembarrassingly parallel problems into embarrassingly parallel ones. The key is to exploit some statistical properties. To demonstrate the method, let’s once again turn to our mutual outlinks problem. The method, applied withwworkers on a links matrixm, consists of the following:

Break the rows ofmintowchunks.

Have each worker find the mean number of mutual outlinks for pairs of
vertices in its chunk.

Average the results returned by the workers.

It can be shown mathematically that for large problems (the only ones you would need parallel computing for anyway), this chunked approach gives the estimators of the same statistical accuracy as in the nonchunked method. But meanwhile, we’ve turned a nonparallel problem into not just a parallel one but an embarrassingly parallel one! The workers in the preced- ing outline compute entirely independently of each other. This method should not be confused with the usual chunk-based approaches in parallel processing. In those, such as the merge-sort exam- ple discussed on page 347, the chunking is embarrassingly parallel, but the combining of results is not. By contrast, here the combining of results consists of simple averaging, thanks to the mathematical theory. I tried this approach on the mutual outlinks problem in a 4-workersnow cluster. This reduced the runtime to 1.5 seconds. This is far better than the serial time of about 16 seconds, double the speedup obtained by the GPU and approaching comparability to the OpenMP time. And the theory show- ing that the two methods give the same statistical accuracy was confirmed as well. The chunked method found the mean number of mutual outlinks to be 249.2881, compared to 249.2993 for the original estimator.

350 Chapter 16

The Art of R Programming

Get our desktop app

Company

Features

Documentation

Resources