The Art of R Programming

16.4.1 Sources of Overhead............................................

Having at least a rough idea of the physical causes of overhead is essential to successful parallel programming. Let’s take a look at these in the contexts of the two main platforms, shared-memory and networked computers.

16.4.1.1 Shared-Memory Machines As noted earlier, the memory sharing in multicore machines makes for eas- ier programming. However, the sharing also produces overhead, since the two cores will bump into each other if they both try to access memory at the same time. This means that one of them will need to wait, causing overhead. That overhead is typically in the range of hundreds of nanoseconds (billionths of seconds). This sounds really small, but keep in mind that the CPU is working at a subnanosecond speed, so memory access often becomes a bottleneck. Each core may also have acache, in which it keeps a local copy of some of the shared memory. It’s intended to reduce contention for memory among the cores, but it produces its own overhead, involving time spent in keeping the caches consistent with each other. Recall that GPUs are special types of multicore machines. As such, they suffer from the problems I’ve described, and more. First, thelatency, which is the time delay before the first bit arrives at the GPU from its memory after a memory read request, is quite long in GPUs. There is also the overhead incurred in transferring data between the host and the device. The latency here is on the order of microseconds (millionths of seconds), an eternity compared to the nanosecond scale of the CPU and GPU. GPUs have great performance potential for certain classes of applica- tions, but overhead can be a major issue. The authors ofgputoolsnote that their matrix operations start achieving a speedup only at matrix sizes of 1000 by 1000. I wrote a GPU version of our mutual outlinks application, which turned out to have a runtime of 3.0 seconds—about half of thesnowversion but still far slower than the OpenMP implementation. Again, there are ways of ameliorating these problems, but they require very careful, creative programming and a sophisticated knowledge of the physical GPU structure.

16.4.1.2 Networked Systems of Computers As you saw earlier, another way to achieve parallel computation is through networked systems of computers. You still have multiple CPUs, but in this case, they are in entirely separate computers, each with its own memory. As pointed out earlier, network data transfer causes overhead. Its latency is again on the order of microseconds. Thus, even accessing a small amount of data across the network incurs a major delay. Also note thatsnowhas additional overhead, as it changes numeric objects such as vectors and matrices to character form before sending them, say from the manager to the workers. Not only does this entail time for the conversion (both in changing from numeric to character form and

346 Chapter 16

The Art of R Programming

16.4.1 Sources of Overhead............................................

Get our desktop app

Company

Features

Documentation

Resources