Analysis of Algorithms : An Active Learning Approach

(Ron) #1

96 SORTING ALGORITHMS


It is also possible to be able to declare an array large enough to hold all of
the data but that the logical memory needs of the program are much greater
than the physical memory available in the computer. This places a reliance on
the computer to effectively implement virtual memory, but in even the best of
circumstances there may be a large amount of data that needs to be swapped
between physical memory and a disk drive. Even with an effective sorting
algorithm like Quicksort, the relationship between the bounds of a partition
and the blocks of logical memory that can be and are loaded may be such that
a large number of blocks will have to be swapped in and out of physical mem-
ory. This problem may not be seen until a program is implemented and runs so
slowly that computer-based analysis tools are needed to identify the problem.
Even in that case, the problem may not be found unless system profiling tools
are available that track virtual memory use.
Our analysis looked at comparison operations to determine what was an
efficient sort algorithm. But the amount of time that can be spent writing
information to and from the disk in the process of a virtual memory block
swap will be much more significant than any logical or arithmetic operation.
Because this is handled by the operating system, we have no real control over
when swaps may occur.
An alternative thought might be to use a direct access file on disk and con-
vert each array access into a seek operation to move to the correct location of
the file, followed by a read. This reduces the amount of logical memory needed
and so reduces the reliance on virtual memory. This still translates to a signifi-
cant amount of disk input and output, which is what is costly, whether done
by the program or the operating system.
All of this makes the sorting algorithms in the last seven sections impractical
when the data set gets extremely large. We will now look at an alternative that
will use four external sequential files and a merging process to accomplish a
sort.
We first identify how many records can reasonably be held in memory
when we account for the size of the executable code and the available memory.
We will declare an array of this size, call it S, that will be used for the two steps
of our sort process. In the first step, we will read in S records and use an appro-
priate internal sort to put these records in order. This set of now sorted records
will be written to file A. We read in a second set of S records, sort them, and
write them to file B. We continue this process, alternating where we write the
Free download pdf