(^58) | Chapter 4: Sorting Algorithms
computers of 50 years ago, the size of the data sets being processed is now on the
order of terabytes of information. Although you may not be called upon to sort
such huge data sets, you will likely need to sort perhaps tens or hundreds of thou-
sands of items. In this chapter, we cover the most important sorting algorithms
and present results from our benchmarks to give specific guidance on which algo-
rithm is best suited for your particular problem.
Terminology
A collection of comparable elementsAis presented to be sorted; we use the nota-
tionsA[i] andaito refer to theith element of the collection. By convention, the
first element in the collection is considered to beA[0]. For notational conve-
nience, we defineA[low,low+n) to be the subcollectionA[low]...A[low+n–1] ofn
elements, whereasA[low,low+n] describes the subcollectionA[low]...A[low+n]
ofn+1 elements.
To sort a collection, the intent is to order the elementsAsuch that ifA[i]<A[j],
theni<j. If there are duplicate elements, then in the resulting ordered collection
these elements must be contiguous; that is, ifA[i]=A[j] in a sorted collection, then
there can be nok such thati<k<jand A[i]≠A[k].
The sorted collectionAmust be a permutation of the elements that originally
formedA. In the algorithms presented here, the sorting is done “in place” for effi-
ciency, although one might simply wish to generate a new sorted collection from
an unordered collection.
Representation
The collection may already be stored in the computer’s random access memory
(RAM), but it might simply exist in a file on the filesystem, known as secondary
storage. The collection may be archived in part on tertiary storage, which may
require extra processing time just to locate the information; in addition, the infor-
mation may need to be copied to secondary storage before it can be processed.
Examples of tertiary storage include tape libraries and optical jukeboxes.
Information stored in RAM typically takes one of two forms: pointer-based or
value-based. Assume one wants to sort the strings “eagle,” “cat,” “ant,” “dog,”
and “ball.” Using pointer-based storage, shown in Figure 4-2, an array of informa-
tion (the contiguous boxes) contains pointers to the actual information (i.e., the
strings in ovals) rather than storing the information itself. Such an approach
enables arbitrarily complex records to be stored and sorted.
Figure 4-2. Sorting using pointer-based storage
Algorithms in a Nutshell
Algorithms in a Nutshell By Gary Pollice, George T. Heineman, Stanley Selkow ISBN:
9780596516246 Publisher: O'Reilly Media, Inc.
Prepared for Ming Yi, Safari ID: [email protected]
Licensed by Ming Yi
Print Publication Date: 2008/10/21 User number: 594243
© 2009 Safari Books Online, LLC. This PDF is made available for personal use only during the relevant subscription term, subject to the Safari Terms of Service. Any other use
tina meador
(Tina Meador)
#1