Algorithms in a Nutshell

(Tina Meador) #1

(^120) | Chapter 5: Searching
Sometimes we write these loops by hand in an assembly language to ensure that
we optimize for the specific architecture, such as making sure that we don’t stall
the instruction pipeline in the more common cases and unroll the loop to fill the
pipeline when possible. One goal, therefore, is to minimize the number of string
comparisons.
We first need to define a function to compute the key for a string,s, in the word
list. One goal for the key function is to produce as many different values as
possible but it is not required for the values to all be unique. A popular technique
is to produce a value based on each piece of information from the original string:
key(s)=s[0]31(len–1)+s[1]31(len–2)+ ... +s[len–1]
wheres[i] is theithcharacter (as a value between 0 and 255) andlenis the length
of the strings. Computing this function is simple as shown in the Java code in
Example 5-6 (adapted from the Open JDK source code), wherecharsis the array
of characters that defines a string.*By our definition, thehashCode() method for
thejava.lang.String class is thekey() function.
Because thishashCodemethod tries to be efficient, it caches the value of the
computed hash to avoid recomputation (i.e., it computes the value only ifhash is 0).
Next, we construct the hash table. We havenstrings but what should be the size
of the hash tableA? In an ideal case,Acould beb=nbins where the hash function
is a one-to-one function from the set of strings in the word collection onto the
integers [ 0 ,n). This does not occur in the normal case, so we instead try to have a
hash table that has as few empty bins as possible. If ourhashfunction distributes
the keys evenly, we can achieve reasonable success by selecting an array size
approximately as large as the collection. We definehash(s)=key(s)%b, where % is
the modulo operator that returns the remainder when dividingkey(s) byb.
The advanced reader should, at this point, question the use of a basic hash func-
tion and hash table for this problem. Since the word list is static, we can do better
by creating aperfect hash function. A perfect hash function is one that guarantees
no collisions for a specific set of keys. In this case a perfect hash function can be
used; this is discussed in the upcming “Variations” section. Let’s first try to solve
the problem without one.



  • The code can be downloaded from the Open JDK website athttp://openjdk.java.net.
    Example 5-6. Sample Java hashCode
    public int hashCode( ) {
    int h = hash;
    if (h == 0) {
    for (int i = 0; i < chars.length; i++) {
    h = 31*h + chars[i];
    }
    hash = h;
    }
    return h;
    }
    Algorithms in a Nutshell
    Algorithms in a Nutshell By Gary Pollice, George T. Heineman, Stanley Selkow ISBN:
    9780596516246 Publisher: O'Reilly Media, Inc.
    Prepared for Ming Yi, Safari ID: [email protected]
    Licensed by Ming Yi
    Print Publication Date: 2008/10/21 User number: 594243
    © 2009 Safari Books Online, LLC. This PDF is made available for personal use only during the relevant subscription term, subject to the Safari Terms of Service. Any other use

Free download pdf