Overview | 59
Sorting
Algorithms
By contrast, value-based storage packs a collection ofnelements into record
blocks of a fixed size,s(suitable for tertiary or secondary storage). Figure 4-3
shows how to store the same information shown in Figure 4-2 using a contiguous
block of storage containing a set of rows of exactlys=5 bytes each. In this example
the information is shown as strings, but it could be any collection of structured,
record-based information. The “¬” character represents the termination of the
string; in this encoding, strings of lengthsdo not have a terminating character.
The information is contiguous and can be viewed as a one-dimensional array
B[0,n*s). Note thatB[r*s+c] is thecthletter of therthword (wherec≥0 andr≥0);
also, theith element of the collection (fori≥0) is the subarrayB[i*s,(i+1)*s).
Information is written to secondary storage usually as a value-based contiguous
collection of bytes. It is possible to store “pointers” of a sort on secondary storage
by using integer values to refer to offset position locations within files on disk.
The algorithms in this chapter can also be written to work with disk-based infor-
mation simply by implementing swap functions that transpose bytes within the
files on disk; however, the resulting performance will differ because of the
increased input/output costs in accessing secondary storage.
Whether pointer-based or value-based, a sorting algorithm updates the informa-
tion (in both cases, the boxes) so that the entriesA[0] ...A[n–1] are ordered. For
convenience, in the rest of this chapter we use theA[i] notation to represent theith
element, even when value-based storage is being used.
Comparable Elements
The elements in the collection being compared must admit a total ordering. That
is, for any two elementspandqin a collection, exactly one of the following three
predicates is true:p=q,p<q,orp>q. Commonly sorted primitive types include
integers, floating-point values, and characters. When composite elements are
sorted (such as strings of characters) then a lexicographical ordering is imposed
on each individual element of the composite, thus reducing a complex sort into
individual sorts on primitive types. For example, the word “alphabet” is consid-
ered to be less than “alternate” but greater than “alligator” by comparing each
individual letter, from left to right, until a word runs out of characters or an indi-
vidual character in one word is less than or greater than its partner in the other
word (thus “ant” is less than “anthem”). This question of ordering is far from
simple when considering capitalization (is “A” greater than “a”?), diacritical
marks (is “è” less than “ê”?) and diphthongs (is “æ” less than “a”?). Note that the
powerful Unicode standard (see http://www.unicode.org/versions/latest) uses
Figure 4-3. Sorting using value-based storage
Algorithms in a Nutshell
Algorithms in a Nutshell By Gary Pollice, George T. Heineman, Stanley Selkow ISBN:
9780596516246 Publisher: O'Reilly Media, Inc.
Prepared for Ming Yi, Safari ID: [email protected]
Licensed by Ming Yi
Print Publication Date: 2008/10/21 User number: 594243
© 2009 Safari Books Online, LLC. This PDF is made available for personal use only during the relevant subscription term, subject to the Safari Terms of Service. Any other use