Algorithms in a Nutshell

Hash-based Search | 119

Searching

were ordered in some way, the hashing method that inserts elements into the hash tableA does not attempt to replicate this ordering withinA. The input to HASH-BASEDSEARCHis the computed hash table,A, and the target elementtbeing sought. The algorithm returnstrueiftexists in the linked list stored byA[h] whereh=hash(t). IfA[h] is empty ortdoes not exist within the linked list stored byA[h], thenfalseis returned to indicate thattis not present in A(and by implication, it does not exist inC). The pseudocode in Figure 5-3 shows the simplified version whereAstores lists containing the elements themselves as key values.

Assumptions

The variablenrepresents the number of elements in the original collectionCand b represents the number of bins in the indexed collection,A.

Context

Suppose we are creating a text editor and want to add a component that will check the spelling of words as the user types. There are several free word lists available that can be downloaded from the Internet (seehttp://www.wordlist.com). Performance is essential for this application. We must be able to search the word list quickly, or the program will be unusable for even the slowest typists. We will probably check word spellings in a separate thread that must keep up with the changes to the document or file.* We must keep the words in memory since disk access will degrade performance too much. A typical English word list contains more than 200,000 words and could be stored in an array that would use about 2.5MB memory (this includes space for the words themselves and a list of pointers to the words, as shown in Figure 5-5). We use pointers because word lengths vary and we want to optimize memory usage.

We know from the earlier section “Binary Search” that we can expect about 18 string comparisons†on average if we use BINARYSEARCH. String comparisons can be expensive—even when we optimize the code, there is a loop that comparesbytes.

* Hash-based searches require the complete word before the search can begin. With a binary search approach, one may search for the work incrementally as it is being typed, but this introduces ad- ditional complexity to the program.

Figure 5-5. Storing strings for hash-based searching

† log (200,000) = 17.61

Algorithms in a Nutshell
Algorithms in a Nutshell By Gary Pollice, George T. Heineman, Stanley Selkow ISBN:
9780596516246 Publisher: O'Reilly Media, Inc.

Prepared for Ming Yi, Safari ID: [email protected]
Licensed by Ming Yi
Print Publication Date: 2008/10/21 User number: 594243
© 2009 Safari Books Online, LLC. This PDF is made available for personal use only during the relevant subscription term, subject to the Safari Terms of Service. Any other use

Algorithms in a Nutshell

Get our desktop app

Company

Features

Documentation

Resources