Algorithms in a Nutshell

(Tina Meador) #1

(^122) | Chapter 5: Searching
Storage space poses another design issue for HASH-BASEDSEARCH. The primary
storage,A, must be large enough to hold all of the search keys with enough space
left over for storing the collision keys. The size ofAis typically chosen to be a
prime number. However, we can use any number when we are not using open
addressing (see the upcoming “Variations” section). A good choice in practice is
2 k–1, even though this value isn’t always prime. The elements stored in the hash
table have a direct effect on memory. Consider how Figure 5-3 stores each string
element in a linked list, so the elements of the array that serves asAare linked list
objects. These are pointers to objects on the heap. Each list has overhead storage
that contains pointers to the first and last elements of the list and, if you use the
LinkedListclass from the Java JDK, a significant amount of additional fields and
classes that make the implementation quite flexible. One could write a much
simpler linked list class that provides only the necessary capabilities, but this
certainly adds additional cost to the implementation of the hash-based searching
algorithm.
If you use theLinkedListclass, each non-empty element ofAwill require 12 bytes
of memory, assuming that the size of a pointer is four bytes. Each string element is
incorporated into aListElementthat requires an additional 12 bytes. For the
previous example of 213,557 words, we require 5,005,488 bytes of memory
beyond the actual string storage. The breakdown of this is:



  • Size of the primary table: 1,048,572 bytes

  • Size of 116,186 linked lists: 1,394,232 bytes

  • Size of 213,557 list elements: 2,562,684 bytes
    Storing the strings also has an overhead if you use the JDKStringclass. Each
    string has 12 bytes of overhead. We can therefore add 213,557*12 = 2,562,684
    additional bytes to our overhead. So, the algorithm chosen in the example
    requires 7,568,172 bytes of memory to support it. The actual number of char-
    acters in the strings in the word list we used in the example is only 2,099,075.


Figure 5-6. Handling collisions with lists

Algorithms in a Nutshell
Algorithms in a Nutshell By Gary Pollice, George T. Heineman, Stanley Selkow ISBN:
9780596516246 Publisher: O'Reilly Media, Inc.


Prepared for Ming Yi, Safari ID: [email protected]
Licensed by Ming Yi
Print Publication Date: 2008/10/21 User number: 594243
© 2009 Safari Books Online, LLC. This PDF is made available for personal use only during the relevant subscription term, subject to the Safari Terms of Service. Any other use

Free download pdf