Algorithms in a Nutshell

Hash-based Search | 125

Searching

All HASH-BASEDSEARCHalgorithms share the first two components; different behaviors come about when variations to collision handling are employed. The cost of computing the hash value must be bounded by a fixed, constant upper bound. If you consider the example in this section, computing the hash value was proportional to the length of the string. For any finite collection of words, there is a longest string with lengthk.Iftkis the time it takes to compute the hash value for the longest string, then it will require≤tkto compute any hash value. Computing the hash value is therefore considered to be a constant time operation. The second part of the algorithm also performs in constant time. If the table is stored on secondary storage, there may be a variation that depends upon the position of the element and the time required to position the device, but this has a constant upper bound. If we can show that the third part of the computation also has a constant upper bound, then we can easily prove that the time performance of hash-based searching is constant. When we use chaining, we use theloadfactor,α=n/b, where bis the number of bins in the hash table andnis the number of elements stored in the hash table. The load factor computes the average number of elements in a list in the chain. The worst-case performance for finding an element by chaining is O(n), which occurs when all elements hash to the same bin in the table. The average number of bins to search isα. For a detailed analysis, see Chapter 11 of (Cormen et al., 2001). Table 5-5 compares the performance of the code from Example 5-8 with the existing JDK classjava.util.Hashtableon hash tables of different sizes. For the tests labeledp=1.0, each of the 213,557 words is used as the target item to ensure that the word exists in the hash table. For the tests labeledp=0.0, each of these words has its last character replaced with a * to ensure that the word does not exist in the hash table. Note also that we keep the size of the search words for these two cases the same to ensure that the cost for computing the hash is iden- tical. We ran each test 100 times and discarded the best- and worst-performing trials. The average of the remaining 98 trials is shown in Table 5-5. To under- stand these results we produce statistics on the hash tables we create, shown in Table 5-6. As the load factor goes down, the average length of each element linked list also goes down, leading to improved performance. As the size of the hash tableAapproximately doubles, the time to find an item decreases, because the linked lists of elements are shorter. Indeed, by the time b=1,045,875 no linked list contains more than five elements. Because a hash table can typically grow large enough to ensure that all linked lists of elements are small, its search performance is considered to be O(1). However, this is contin- gent (as always) on having sufficient memory and a suitable hash function to disperse the elements throughout the bins of the hash table. About 81% of the 213,557 words are mapped to unique hash table bins whenb=1,045,875 using the hash function from Example 5-6.

Algorithms in a Nutshell
Algorithms in a Nutshell By Gary Pollice, George T. Heineman, Stanley Selkow ISBN:
9780596516246 Publisher: O'Reilly Media, Inc.

Prepared for Ming Yi, Safari ID: [email protected]
Licensed by Ming Yi
Print Publication Date: 2008/10/21 User number: 594243
© 2009 Safari Books Online, LLC. This PDF is made available for personal use only during the relevant subscription term, subject to the Safari Terms of Service. Any other use

Algorithms in a Nutshell

Get our desktop app

Company

Features

Documentation

Resources