Advanced Programming in the UNIX® Environment

(lily) #1
ptg10805159

748 ADatabase Library Chapter 20


To find a record in the database given its key,db_fetchcalculates the hash value
of the key,which leads to one hash chain in the hash table. (Thechain ptrfield could be
0, indicating an empty chain.)We then follow this hash chain, which is a linked list of
all the index records with this hash value. When we encounter achain ptrvalue of 0,
we’ve hit the end of the hash chain.
Let’s look at an actual database file. The program in Figure20.3 creates a new
database and writes three records to it. Since we storeall the fields in the database as
ASCII characters, we can look at the actual index file and data file using any of the
standardUNIX System tools:

$ls -l db4.*
-rw-r--r-- 1 sar 28 Oct 19 21:33 db4.dat
-rw-r--r-- 1 sar 72 Oct 19 21:33 db4.idx
$cat db4.idx
0 53 350
010Alpha:0:6
010beta:6:14
17 11gamma:20:8
$cat db4.dat
data1
Data for beta
record3

To keep this example small, we have set the size of each ptrfield to four ASCII
characters; the number of hash chains is set to 3. Since eachptris a file offset, a
four-character field limits the total size of the index file and data file to 10,000 bytes.
When we do some performance measurements of the database system in Section 20.9,
we set the size of eachptrfield to six characters (allowing file sizes up to 1 million bytes)
and the number of hash chains to morethan 100.
The first line in the index file
0 53 350
consists of the free-list pointer (0, the free list is empty) and the three hash chain
pointers(53, 35, and 0).The next line
010Alpha:0:6
shows the format of each index record. The first field( 0 )is the four-character chain
pointer.This record is the end of its hash chain. The next field( 10 )is the four-character
idx len,the length of the remainder of this index record. Weread each index record
using tworeads: one to read the two fixed-size fields (thechain ptrandidx len)and
another to read the remaining (variable-length) portion. The remaining three
fields—key,dat off,anddat len—aredelimited by a separator character (a colon in this
case). Weneed the separator character,since each of these three fields is variable length.
The separator character can’t appear in the key.Finally,anewline terminates the index
record. The newline isn’t required, sinceidx lencontains the length of the record. We
storethe newline to separate each index record so we can use the normal UNIX System
tools, such ascatandmore,with the index file. Thekeyis the value that we specified
Free download pdf