188 PARALLEL ALGORITHMS
single processor. If we have p < N / 2, we can perform a “preprocessing” step
that reduces the number of values to 2 *p, and then this algorithm can con-
tinue as shown above.
You should see that there are lg N passes of this algorithm, putting the time
atO(lgN). You may recall that we talked about the cost of an algorithm in Sec-
tion 7.1.3. We said that the cost was the time multiplied by the number of pro-
cessors, so the cost for this algorithm is N / 2 *O(lgN), or more simply O(N
lgN). The simple sequential algorithm we considered in Chapter 1 took only
O(N), so this parallel version is more costly, although it does run much faster.
If parallel computing is really beneficial, there must be a faster alternative
method that will cost no more than the sequential version. If we look closely,
we see that the problem with the cost is the number of processors. We need to
consider how we can reduce this number. If we want the total cost at the opti-
mal level of O(N) and the run time of the parallel algorithm is O(lgN), it must
be the case that we can only use N / lg N processors. This also means that the
first pass must have each processor handle N / (N / lg N) values, which is lg N.
This results in the following alternative parallel algorithm:
Parallel Start
for j = 1 to N/lg N do
Pj finds the maximum of M1+(j-1)*lg N through Mj*lg N
using the sequential algorithm
Pj writes the maximum to Mj
end for
Parallel End
count = (N / lg N) / 2
for i = 1 to (lg count) + 1 do
Parallel Start
for j = 1 to count do
Pj reads M2j into X and M2j+1 into Y
if X > Y
Pj writes X into Mj
else
Pj writes Y into Mj
end if
end for j
Parallel End
count = count / 2
end for i