Algorithms in a Nutshell

(Tina Meador) #1

(^128) | Chapter 5: Searching
Assuming we ensure that we don’t revisit a slot, the worst-case performance of this
approach is O(b). Open addressing performs well for searching. The expected
number of probes in an unsuccessful search is 1/(1–α), and the worst-case
performance for a successful search is (1/α) ln (1/1–α);*see (Cormen et al.,
2001) for details. One finds two common types ofprobingwith open addressing.
The first islinear probing, where our hash function ish(u,j)=(h’(u)+j) modn.
When we have a collision, we simply look at the next slot in the hash table, using
it as a circular structure. This approach is susceptible to clusters of elements
forming that lead to long sequences of slots that must be searched—especially ifα
is close to 1.0. To mitigate this clustering we might usequadratic probing,in
which our hash function is of the formh(u,j)=(h’(u)+f(j)) modm, wherefis a
quadratic function onj.
In Table 5-5 we saw that there is tremendous savings by enforcing that the hash
table does not rehash its contents. A rehash operation is made possible if the hash
table can be constructed with a target load factor before any elements are added
to it. When the hash table contains more elements than it was designed for, it can
resize itself. The typical way to do this is to double the number of bins and add
one (since hash tables usually contain an odd number of bins). Once more bins
are available, all existing elements in the hash table must be rehashed to be prop-
erly placed in the new structure. This is an expensive operation that should reduce
the overall cost of future searches, but it must be run infrequently; otherwise, the
performance of the hash table will degrade. You should allow a hash table to
rehash its contents when you are unsatisfied with the ability of the hash function
to assign elements evenly within the hash table. The default target load factor for
thejava.util.Hashtableclass is .75; if you set this value ton/b, then the hash
table will never call rehash.
The example shown previously in the “Context” section used a fixed set of strings
for the hash table. When confronted with this special case, we can achieve truly
optimal performance by usingperfect hashing. Perfect hashing uses two hash func-
tions. We use a standard hash function to index into the primary table,A. Each
slot,A[i], points to a smaller secondary hash table,Si, that has an associated hash
functionhi. If there arekkeys that hash to slotA[i], then theSiwill containk^2
slots. This seems like a lot of wasted memory, but judicious choice of the initial
hash function can reduce this to an amount similar to previous variations. The
selection of appropriate hash functions guarantees that there are no collisions in
the secondary tables. This means we have an algorithm with constant perfor-
mance—O(1). Details on the analysis of perfect hashing can be found in (Cormen
et al., 2001). Doug Schmidt (1990) has written an excellent paper on perfect
hashing generation and there are freely available downloads of perfect hash func-
tion generators in various programming languages.
In general, although there may be many potential elementseiassociated with a
specific key valuek, the hash tableAmight be designed to store just a single
one of these. That is, ifei∈Candej∈C, theni=jif and only ifkey(ei)=key(ej).
The reason for this restriction is to enable lookup for a single elementewhen



  • Theln function computes the natural logarithm of a number in basee.
    Algorithms in a Nutshell
    Algorithms in a Nutshell By Gary Pollice, George T. Heineman, Stanley Selkow ISBN:
    9780596516246 Publisher: O'Reilly Media, Inc.
    Prepared for Ming Yi, Safari ID: [email protected]
    Licensed by Ming Yi
    Print Publication Date: 2008/10/21 User number: 594243
    © 2009 Safari Books Online, LLC. This PDF is made available for personal use only during the relevant subscription term, subject to the Safari Terms of Service. Any other use

Free download pdf