Also, note carefully that even being embarrassingly parallel does not
make an algorithm efficient. Some such algorithms can still have significant
communication traffic, thus compromising performance.
Consider the KMC problem, run undersnow. Suppose we were to set
up a large enough number of workers so that each worker had relatively
little work to do. In that case, the communication with the manager after
each iteration would become a signficant portion of run time. In this situa-
tion, we would say that thegranularityis too fine, and then probably switch
to using fewer workers. We would then have larger tasks for each worker,
thus acoarsergranularity.
16.4.3 Static Versus Dynamic Task Assignment...........................
Look again at the loop beginning on line 26 of our OpenMP example,
reproduced here for convenience:
for (i = me; i < nval; i += nth) {
mysum += procpairs(i,m,nval);
}
The variablemehere was the thread number, so the effect of this code
was that the various threads would work on nonoverlapping sets of values of
i. We do want the values to be nonoverlapping, to avoid duplicate work and
an incorrect count of total number of links, so the code was fine. But the
point now is that we were, in effect, preassigning the tasks that each thread
would handle. This is calledstaticassignment.
An alternative approach is to revise theforloop to look something
like this:
int nexti = 0; // global variable
...
for ( ; myi < n; ) { // revised "for" loop
#pragma omp critical
{
nexti += 1;
myi = nexti;
}
if (myi < n) {
mysum += procpairs(myi,m,nval);
...
}
}
...
This isdynamictask assignment, in which it is not determined ahead of
time which threads handle which values ofi. Task assignment is done dur-
ing execution. At first glance, dynamic assignment seems to have the poten-
tial for better performance. Suppose, for instance, that in a static assignment
348 Chapter 16