Computer Science 461Signature Hashing Experiment
Basic Idea:
- To look at 2 collision resolution techniques (one of them being Signature Hashing) and compare them w.r.t. predicted and real performance.
 
In the course directory is a sub-directory called 'SigHash' which contains the following:
thekeys : a file of keys to load. They are University ID numbers in the range 770000 - 999999., and 200000-229999. Most, but not all are in the range 980000 - 999999, and 200000-229999.
K1 : list of keys to search for (including non-existent keys)
K2 : a second list of keys to search for (including non-existent keys)
load1 : a program to load keys using a traditional method
search1 : a program to search for the given keys (keeps track of search length information)
 
 
PART 1:
Given the number of keys you have to work with (800) and your chosen table size predict the expected maximum chain lengths for successful searches and for keys not found using two of the traditional collision resolution techniques (one chained and one open).
 
PART 2:
Write a program to implement the Signature Hashing Technique. Use it to load the given keys. Keep track of search lengths (probes into both the separator table and the hash table)
The given keys range as follows:
770000-939999     5%
940000-959999   15%
960000-999999   50%
200000-229999   30%
 
You may write two programs: one to do the load and one to search for keys
 
PART 3:
Now run an "experiment" that compares signature hashing against the given method (code provided) w.r.t.
- program efficiency
- actual search lengths (which can be thought of as roughly equivalent to a count of seeks)
    {use both time and gprof; see lab notes and man pages for info}
- Have your program write the following information into a running log file:
- date (dd/mm/yy)
- time
(hh.mm.ss)
- which server you are logged in to
(csa, csb,...) (see gethostname)
- what is the overall system load and what is the load on your current server

- total program run time (see popen with uptime)
- individual search run times (as an average of N requests)

(this information should be appended to the log file each time the program is run)
- run each method at least 10 times (different times of day, different machines, different loads). Be sure to run both methods under similar circumstances each time so you'll have something reasonable to compare
 
PART 4:
1. Plot your results as execution time vs. load. What does this tell you? How can you normalize these results?
2. What could be done to your program to make it more efficient?
3. What else can be said about the results?

Back to Top
Copyright © Katrin Becker 1998, 1999 Last Modified