the table serially, you get a jumpstart to the likeliest element to contain your value.
This is achieved by loading the table carefully, and not in serial order. What you do instead
is apply some kind of transformation (known as a hashing function) on a data value from an
element to be stored. The hashing function will yield a value in the range 0...tablesize-1,
and that becomes the index where you try to store that record.
If the slot is already taken, search forward from that point in the table for the next empty
slot.
Alternatively, you can set up a linked list hanging off that element, and simply add it to the
end (either end, by the way). Or you can even hang a second hash table off the element.
When you look up a data item, you don't need to start searching entries from element zero.
Instead, again hash the value you want to locate, and start looking from that point in the
table.
Hashing is a tried and tested table lookup optimization, and it's used everywhere in systems
software: in databases, operating systems, and compilers.
If I were stranded on a desert island and could only take one data structure with me, it
would be the hash table.
A colleague had to write a program that at one point stored filenames and information about each file.
The data was stored in a table of structs, and he decided to use hash lookup. Here's where the coding
for debuggability came in. He didn't try to get every part of the program working in one swell foop.
He got it working for the simplest case first, by making the hash function always return the constant
zero. The function looked like this:
/* hash_file: Placeholder for more sophisticated future
routine */
int hash_filename (char *s)
{
return 0;
}
The code that called it looked like this: