548 High performance computing and parallelism
Figure 16.4. Hypercubes of order 0 (single dot) to order 4 (two cubes, one inside
the other).
it on a parallel machine. Each job can partly be done in parallel, but there will
always be some parts that have to be carried out sequentially. Suppose that on a
sequential computer, a fractionpseqof the total effort consists of jobs which are
intrinsically sequential and that a fractionppar= 1 −pseqcan in principle be carried
out in parallel. If we ran the program on a parallel machine and distributed the
parallel parts of the code overPprocessors, the total run time would decrease by a
factor
tseq
tpar
=
pseq+ppar
pseq+ppar/P
=
1
pseq−( 1 −pseq)/P
(16.1)
(tseq andtpar are the total run times on the sequential and parallel machine
respectively). Theprocessor efficiencyis defined by
P.E.=
1
Ppseq+( 1 −pseq)
. (16.2)
The processor efficiency is equal to 1 if all the work can be evenly distributed over
thePprocessors.
Even if we could makeParbitrarily large, the maximum decrease in runtime
can never exceed 1/pseq– this is Amdahl’s law – and the processor efficiency will
decrease as 1/P. We see that for largerP, increase of performance decays to zero.
Amdahl’s law should not discourage us too much, however, since if we have a larger
machine, we usually tackle larger problems, and as a rule, the parallelisable parts
scale with a higher power of the problem size than the sequential parts. Therefore,
for really challenging problems, a processor efficiency of more than 90% can often
be achieved.
16.3.2 Programming implications
Writing programs which exploit the potential of parallel computers is different
from vectorising, as there are many more operations and procedures that can be
parallelised than can be vectorised. Moreover, we might want to tailor our program
to the particular network topology of the machine we are working on. Ideally,