Computational Physics

(Rick Simeone) #1

542 High performance computing and parallelism


b(i–1)

a(i), b(i) Adjust format
a(i+1), b(i+1)
a(i+2), b(i+2)
a(i+3), b(i+3)

a(i+4) b(i+4)

a(i–1)

Line up exponents
Compare exponents

Add mantissas

Figure 16.1. A pipeline for adding vectors.

Shift exponent such that the mantissa lies between 0.1 and 1.0;
Write result to memory.

Disregarding the load from and write to memory, we still have four steps to carry
out for the addition. Each of these steps requires at least one clock cycle, and a
conventional processor has to wait until the last step has been completed before it
can accept a new command.
A pipeline processor, however, can perform the different operations needed to
add two floating point numbers at the same time (in parallel). This is of no use
when only two numbers are added, as this calculation must be completed before
starting execution of the next statement. However, if we have a sequence of similar
operations to be carried out, like in the addition of two vectors:


FORi=1TONDO
c[i]=a[i]+b[i];
END FOR

then it is possible to have the processor comparing the exponents ofa[i+ 3 ]and
b[i+ 3 ], lining up the exponents ofa[i+ 2 ]andb[i+ 2 ], adding the mantissas of
a[i+ 1 ]andb[i+ 1 ]and puttinga[i]andb[i]into the right formatsimultaneously.
Of course this process acts at full speed only aftera[ 4 ]andb[ 4 ]have been loaded
into the processor and only untila[N]andb[N]have entered it. Starting up and
emptying the pipeline therefore represent a small overhead.Figure 16.1shows how
the process works and also renders the analogy with the pipeline obvious.
In the course of this pipeline process, one addition is carried out at each clock
cycle. We call an addition or multiplication of real numbers afloating point
operation(FLOP). We see that the pipeline arrangement makes it possible to per-
form one FLOP (FLOP) per clock cycle. Multiplication and division require many
more than four clock cycles in a conventional processor, and if each of the steps
involved in the multiplication or division can be executed concurrently in a pipeline

Free download pdf