Algorithms in a Nutshell

(Tina Meador) #1

(^116) | Chapter 5: Searching
Variations
If you wanted to support a “search-or-insert” operation, then the final value ofix
at line 10 in Figure 5-2 identifies the index value into which the missing element
can be inserted (and all higher indexed values would need to be bumped up).
There are two primary variations on BINARYSEARCH. The first involves dealing
with dynamic data where one must tune the implementation to allow efficient
insertions and deletions into the collection while maintaining acceptable search
performance. If an array is used to store the collection, insertions and deletions are
quite inefficient, since every array entry should contain a valid element. Therefore,
inserting involves extending the array (physically or logically) and pushing on average
half of the elements ahead one slot. Deletion requires shrinking the array and moving
half of the elements one index position lower. Neither of these is acceptable.
As long as the collection fits in memory, a good choice is to switch to a hash-
based search approach usingcollision chaining. See the later section “Hash-based
Search,” which describes a straightforward approach to searching dynamic data.
An alternate method is to create a binary search tree in memory. This approach
can be simple to implement if the insertions and deletions are random enough
that the tree does not become too biased. However, experience has shown that
this is rarely the case, and a more complex type of search tree—a balanced binary
search tree—must be used (Cormen et al., 2001).
The second variation addresses the case in which the data is both dynamic and
too large to fit into memory. When this occurs, the search time is dominated by
input/output operations to secondary storage. One effective solution is to use an
n-ary tree called a B-Tree. This is a multilevel tree that is fine-tuned for perfor-
mance on secondary storage. A complete analysis of B-Trees can be found in
(Cormen et al., 2001). A helpful online B-Tree tutorial with examples can be
found athttp://www.bluerwhite.org/btree.
Hash-based Search
The previous sections on searching apply in specific cases that require a small
number of elements (SEQUENTIALSEARCH) or an ordered collection (BINARY
SEARCH). We need more powerful techniques for searching larger collections that
are not necessarily ordered. One of the most common approaches is to use ahash
functionto transform one or more characteristics of the searched-for item into a
value that is used to index into an indexed hash table. Hash-based searching has
better average-case performance than the other search algorithms described in this
chapter. Many books on algorithms discuss hash-based searching under the topic
ofhashtables(Chapter 11 in Cormen et al., 2001); you may also find this topic in
books on data structures that describe hash tables.
In HASH-BASEDSEARCH(Figure 5-3), thenelements of a collectionCare first
loaded into a hash tableAthat hasbbins. The concept of akeyenables this to
happen. Each elemente∈Ccan be mapped to a key valuek=key(e) such that if
ei=ejthenkey(ei)=key(ej).*A hash functionh=hash(e) uses the key valuekey(e)to



  • Note that the reverse is not necessarily true since key values do not have to be unique.
    Algorithms in a Nutshell
    Algorithms in a Nutshell By Gary Pollice, George T. Heineman, Stanley Selkow ISBN:
    9780596516246 Publisher: O'Reilly Media, Inc.
    Prepared for Ming Yi, Safari ID: [email protected]
    Licensed by Ming Yi
    Print Publication Date: 2008/10/21 User number: 594243
    © 2009 Safari Books Online, LLC. This PDF is made available for personal use only during the relevant subscription term, subject to the Safari Terms of Service. Any other use

Free download pdf