546 High performance computing and parallelism
We have already described the SISD type: this is the Von Neumann architecture,
in which there is one processor which can process a single data stream: the sequence
of data which is fetched from or written to memory by the processor. The second
type, MISD is not used in practice but it features in the list for the sake of com-
pleteness. The two last types are important for parallel machines. SIMD machines
consist of arrays of processor elements which have functional units controlled by a
single control unit: this unit sends a message to all processors which then all carry
out the same operation on the data stream accessed by them. MIMD machines con-
sist of an array of processors which can run entire programs independently, and
these programs may be different for each processor. In very large-scale scientific
problems, most of the work often consists of repeating the same operation over and
over. SIMD architectures are suitable for such problems, as the processor elements
can be made faster than the full processors in a MIMD machine at the same cost.
The latter obviously offer greater flexibility and can be used as multi-user machines,
which is impossible for SIMD computers.
Another classification of parallel machines is based on the structure of the
memory. We havedistributed memoryandshared memoryarchitectures. In the
latter, all processors can all access the same memory area using some communica-
tion network (sometimes they can also communicate the contents of vector registers
or cache to the other processors). In distributed systems on the other hand, each
processor has its own local memory, and processors can interact with each other
(for example to exchange data from their local memories) via the communication
network. Some machines can operate in both modes. Shared memory architectures
are easier to program than distributed memory computers as there is only a single
address space. Modifying a conventional program for a shared memory machine
is rather easy. However, memory access is often a bottleneck with these machines.
This will be clear when you realise that memory bank conflicts are much more likely
to occur when 10 processors are trying to access the memory instead of a single
one. Distributed memory systems are more difficult to program, but they offer more
potential if they are programmed adequately. The shared memory model is adopted
in several supercomputers and parallel workstations containing of the order of 10
powerful vector processors. However, the most powerful machines at present and
for the future seem to be of the distributed memory type, or a mixture of both.
In parallel machines, thenodes, consisting of a processor, perhaps with local
(distributed) memory, must be connected such that data communication can proceed
fast, but keeping an eye on the overall machine cost. We shall mention some of these
configurations very briefly. The most versatile option would be to connect each
processor to each of the others, but this is far too expensive for realistic designs.
Therefore, alternatives have been developed which are either tailored to particular
problems, or offer flexibility through the possibility of making different connections