Algorithms in a Nutshell

(Tina Meador) #1
Hash-based Search | 119

Searching

were ordered in some way, the hashing method that inserts elements into the hash
tableA does not attempt to replicate this ordering withinA.
The input to HASH-BASEDSEARCHis the computed hash table,A, and the target
elementtbeing sought. The algorithm returnstrueiftexists in the linked list
stored byA[h] whereh=hash(t). IfA[h] is empty ortdoes not exist within the
linked list stored byA[h], thenfalseis returned to indicate thattis not present in
A(and by implication, it does not exist inC). The pseudocode in Figure 5-3
shows the simplified version whereAstores lists containing the elements them-
selves as key values.

Assumptions

The variablenrepresents the number of elements in the original collectionCand
b represents the number of bins in the indexed collection,A.

Context


Suppose we are creating a text editor and want to add a component that will
check the spelling of words as the user types. There are several free word lists
available that can be downloaded from the Internet (seehttp://www.wordlist.com).
Performance is essential for this application. We must be able to search the word
list quickly, or the program will be unusable for even the slowest typists. We will
probably check word spellings in a separate thread that must keep up with the
changes to the document or file.*
We must keep the words in memory since disk access will degrade performance
too much. A typical English word list contains more than 200,000 words and
could be stored in an array that would use about 2.5MB memory (this includes
space for the words themselves and a list of pointers to the words, as shown in
Figure 5-5). We use pointers because word lengths vary and we want to optimize
memory usage.

We know from the earlier section “Binary Search” that we can expect about 18 string
comparisons†on average if we use BINARYSEARCH. String comparisons can be
expensive—even when we optimize the code, there is a loop that comparesbytes.

* Hash-based searches require the complete word before the search can begin. With a binary search
approach, one may search for the work incrementally as it is being typed, but this introduces ad-
ditional complexity to the program.

Figure 5-5. Storing strings for hash-based searching

† log (200,000) = 17.61

Algorithms in a Nutshell
Algorithms in a Nutshell By Gary Pollice, George T. Heineman, Stanley Selkow ISBN:
9780596516246 Publisher: O'Reilly Media, Inc.


Prepared for Ming Yi, Safari ID: [email protected]
Licensed by Ming Yi
Print Publication Date: 2008/10/21 User number: 594243
© 2009 Safari Books Online, LLC. This PDF is made available for personal use only during the relevant subscription term, subject to the Safari Terms of Service. Any other use

Free download pdf