Computer Science 461 Hashing Experiment
NOTE: Parts 1 & 2 are to be handed in for the first part (worth 25%) of this assignment.
 
PART 1:
You are given a set of 800 keys. They are University ID numbers in the range 720000 - 999999. Most, but not all are in the range 960000 - 999999.
1. Choose at least 3 different hashing algorithms for mapping these keys onto addresses. Describe them and explain why you chose them. This will include an outline of the algorithm and relevant advantages and disadvantages. They can be discussed separately or together. Simple array structures are sufficient for manipulating and storing keys.
2. Implement (in C or C++ ONLY - use the g++ compiler in both cases) and run each of the three on the 800 keys. Report your results:
- collisions
- program execution times (see man entry for clock)
Choose the one algorithm that you think is best for part 2. Justify your choice.
PART 2:
1. Implement 2 different collision resolution techniques (c.r.t.).
2. Load 2 tables with identical keys (one for each c.r.t.)
3. Explain your choices w.r.t.:
- table size
- overflow area or not
- bucket size, etc.
4. Predict expected average and maximum chain lengths for successful searches and for keys not found.
PART 3:
1. Write 2 programs ( each using one of the collision resolution techniques from part 2) that will read a list of keys and search for each of the keys in turn.
2. Keep track of the following information:
- Ave. Search length (with and without keys not found)
- Max. search length (separately for keys found and keys not found)
3. Have your program write the following information into a running log file:
- date (dd/mm/yy)
- time
(hh.mm.ss)
- which server you are logged in to
(csa, csb,...) (see gethostname)
- what is the overall system load and what is the load on your current server

- total program run time (see popen with uptime)
- individual search run times (as an average of N requests)

(this information should be appended to the log file each time the program is run)
4. Run each program at least 10 times ( different times of day, different machines, different system loads)
PART 4:
1. Plot your results as time vs. load. What does this tell you? How can you normalize these results?
2. What else can be said about the results?

Back to Top
Copyright © Katrin Becker 1998, 1999 Last Modified