CPSC 461: Copyright © 2002 Katrin Becker 1998-2002 Last Modified November 19, 2001 02:09 PM

Signature Hashing
     
[reference: Larson, P-A, and Ajay Kajla, "File Organization: Implementation of a Method Guaranteeing Retrieval in One Access, CACM Vol 27, No. 7, (July 1984) p670-677]
 
- combines hashing with the use of signatures.
- uses a separate hash table (which is to be stored in memory)
 
- idea is to use a hash table (in memory) to search for a likely location of a given key, thereby reducing data file accesses to 1 (for both keys found and not found)
 
- normally in hashing the order of the synonyms in a probe chain is determined by the order of insertion or by movement of records due to subsequent insertions (like Brent's method).
 
With signature hashing the order of the records is determined by their signature values.
There are 2 types of signatures involved here:
1. position-key [p-k] signature; this changes with each position
2. separator signature: the minimum signature for values stored after the current position
 
Each key has a "standard" probe order defined by a first and second hash function - the first determines the home address and the second determines the 'step size' for jumps along the chain:
eg. KEY = 43
H1(key) = 7
H2(key) = 2
 
so the probe chain is 7, 9, 11, ... (including wrap-around)
 
There is a matching sequence of signatures produced for each location along the probe chain:
S0(key) = key mod 15, for the home address
Si(key) = ((Si-1(key)+1) * key mod 15, for the others
 
When searching for a key we start at the home address and looking in the signature table check the signature of the key against the separator in the table:
If S(key) < separator then this is where the key should be. Now we look in the data file - if the record is not there then it doesn't exist in the file.
If S(key) > separator then we must go to the next step along the probe chain and check that separator. The key will have a new signature for this location. (We don't look in the data file at all = no seeks)
If S(key) = separator this means the key we are looking for may be in the file and we need to look. This is the only situation where we may have to look in the file more than once. If what we are looking for is not found, we need to go to the next place in the probe chain and look again.
When trying to place a key we still start at the home address and looking in the signature table check the signature of the key against the separator in the table:
If S(key) > separator then we must go to the next step along the probe chain and check that separator. The key will have a new signature for this location. (We don't look in the data file at all = no seeks)
If S(key) <= separator then this is where the key should go. Now we look in the data file - if there is no room (=collision), we must compare the signature of the incoming key against the signature of the resident key. The one with the smaller signature gets to stay, we change the corresponding separator to be equal to one less than the higher of the two signatures. If the signatures are equal then we just use that. The one with the larger signature gets placed farther along it's own probe chain.
 

This is most easily explained by example:
 
We start off with a file of records located by the typical hashing functions: Initially all separators are set to the maximum separator signature. We will use the usual hash function and i-value functions to find locations:
F(key) = key mod 11
i(key) =Quotient(key/11) mod 11 
 
 addr  Key Records
0  22  
 1  34  
 2  13  
 3  25  
The function for the p-k signature will be
S0(key) = key mod 15
Si(key) = ((Si-1(key)+1) * key mod 15
This will give us a value in 4 bits guaranteed to be less than 15. In reality this is not a good signature function - it's not particularly random. In practice we would use something like:
ri+1 =(ari + d) mod 232
where a = 3141592653 and d = 2718281829 and r0 = key
 
Remember that signatures must be linked to probe chains (i.e. S depends on how far away k is from its home address) and that each key has its own probe sequence (which is a specific permutation of all the addresses in the file).
Here are the keys we wish to insert:
17, 35, 14, 16, 25, 29, 24, 27, 36, 13, 28 { the last 2 will not be inserted, merely searched for}
key  0 1 2 3
 17  Hi,j  6 7 8 9
   Si,j 2 6 14 0
key  0 1 2 3
 35  Hi,j  2 5 8 0
   Si,j 5 0 5 0
key  0 1 2 3
 14  Hi,j  3 4 5 6
   Si,j 14 0 14 0
key  0 1 2 3
 16  Hi,j  5 6 7  8
   Si,j  1  2  3  4
key  0 1 2 3
 25  Hi,j  3 5 7 9
   Si,j  10 5  0  10
key  0 1 2 3
 29  Hi,j  7  9  0  2
   Si,j  14  0  14  0
key  0 1 2 3
 24  Hi,j  2  4  6  8
   Si,j  9  0  9  0
key  0 1 2 3
 27  Hi,j  5  7  9  0
   Si,j  12  6  9  0
key  0 1 2 3
 36  Hi,j  3  6  9  1
   Si,j  6  12  3  9
key  0 1 2 3
 13  Hi,j  2  3  4  5
   Si,j  13  2  9  10
key  0 1 2 3
 28  Hi,j  6  8  10  1
   Si,j  13  2  9  10

 addr separator  key
0 15  
 1  15  
 2  15  35
 3  15  14
 4  15  
 5  15  16
 6  15  17
 7  15  
 8  15  
 9  15  
 10  15  
 
 
Insert 17 @ 6; separator is OK
Insert 35 @ 2; separator is OK
Insert 14 @ 3; separator is OK
Insert 16 @ 5; separator is OK
Insert 25 @ 3; collision
p-k signature for 25 = 10 which is < separator so we want to store it @3 but we can't so we need to re-order the records in the chain
p-k(14) = 14; p-k(25) = 10 so 25 goes first; separator is set to the highest value - 1: 13; now we try to insert 14 at its next location: 4; this works  

 addr separator  key
0 15  
 1  15  
 2  15  35
 3  13  25
 4  15  14
 5  15  16
 6  15  17
 7  15  29
 8  15  
 9  15  
 10  15  
Now if we want to try and find 14, we can do so by examining the separator table (which is to be stored separate from the file of records. The Separator table should be held in RAM. To find 14: hash(14) = 3; separator @3 = 14 ; p-k(14) = 14, which is not less than the separator so 14 must be farther down the chain; check the next separator. It's 15 which is < p-k1(14) = 0 so record must be here. We looked at 2 separators but only looked in the file of records once.
 
Insert 29 @ 7; separator is OK
Insert 24 @ 2: collision; need to re-order keys so lowest goes first:
S0(24) = 9 S0(35) = 5; 35 stays
set Sep2 to 9; try next location for 24
H1(24) = 4 Goes to 4: collision with 14; need to re-order so
S1(24) = 0 S1(14) = 0
 

 addr separator  key
0 15  
 1  15  
 2  8  35
 3  13  25
 4  15  14
 5  15  16
 6  15  17
 7  15  29
 8  15  
 9  15  
 10  15  
  We leave 14; set the separator to 0 and move 24 along.
Next place for 24 is @6; collision so
S2(24) = 9 S0(17) =2;
24 is higher; leave 17; set separator to 8; move 24 along
Next place for 24 is @8; free
place it

 addr separator  key
0 15  
 1  15  
 2  8  35
 3  13  25
 4  0  14
 5  15  16
 6 8  17
 7  15  29
 8  15  24
 9  15  
 10  15  
  Now if we want to find 24:
hash to 2: S0(24) = 9; sep2 = 8 [>] means look further
H1(24) = 4
go to 4: S1(24) = 0; sep4 = 0 [=] means look in: not found; must look further*
    { this messes up the one-access claim, but perhaps not by much }
H2(24) = 6
go to 6: S2(24) = 9; sep6= 8 [>] means look further
H3(24) = 8
go to 8: S2(24) = 0; sep8 = 15 [<] means must be here or doesn't exist found
 
Chain length was 4 but we only looked into the file twice.


 addr separator  key
0 15  
 1  15  
 2  8  35
 3  13  25
 4  0  14
 5  11  16
 6 8  17
 7  13  27
 8  15  24
 9  15  29
 10  15  
 
Insert 27 @ 5: collision; need to re-order keys so lowest goes first:
S0(27) = 12 S0(16) = 1; 16 stays
set Sep5 to 11; try next location for 27
Goes to 7: collision with 29; need to re-order so
H1(27) = 7
S1(27) = 6 S0(29) = 14; move 29
place 27; set separator to 13
Goes to 9: free; place 29
 

 addr separator  key
0 15  27
 1  15  
 2  8  35
 3  9  25
 4  0  14
 5  3  16
 6 8  17
 7 5  25
 8  15  24
 9  8  29
 10  15  
Insert 36 @ 3: collision; need to re-order keys so lowest goes first:
S0(36) = 6 S0(25) = 10; move 25 place 36
set Sep3 to 9; try next location for 25
H1(25) = 5
Goes to 5: collision with 16; need to re-order so
S1(25) = 4 S0(16) = 1; 16 stays, move 25 along
set separator to 3
H2(25) = 7
Goes to 7: collision
S2(25) = 5 S1(27) = 6; place 25; move 26
set separator to 5; try next for 27
H2(27) = 9
Goes to 9: collision
S2(27) = 9 S1(29) = 0; move 27 along
set separator to 8; try next for 27
H3(27) = 0
Goes to 0: free! place it
 

Let's try searching for each:
 
17 : H0(17) = 6; S0(17) = 2; sep6 = 8 [ < means LOOK ] FOUND
 
35 : H0(35) = 2; S0(35) = 5; sep2 = 8 [ < means LOOK ] FOUND
 
14 : H0(14) = 3; S0(14) =14; sep3 = 9 [ > means GO NEXT ]
14 : H1(14) = 4; S1(14) =0; sep4 = 0 [ = means LOOK ] FOUND
 
16 : H0(16) = 5; S0(16) = 1; sep5 = 3 [ < means LOOK ] FOUND
 
25 : H0(25) = 3; S0(25) =10; sep3 = 9 [ > means GO NEXT ]
25 : H1(25) = 5; S1(25) = 4; sep5 = 3 [ > means GO NEXT ]
25 : H2(25) = 7; S2(25) = 5; sep7 = 5 [ = means LOOK ] FOUND
 
29 : H0(29) = 7; S0(29) =14; sep7 = 5 [ > means GO NEXT ]
29 : H1(29) = 9; S1(29) = 0; sep9 = 8 [ < means LOOK ] FOUND
 
24 : H0(24) = 2; S0(24) =9; sep2 = 8 [ > means GO NEXT ]
24 : H1(24) = 4; S1(24) = 0; sep4 = 0 [ = means LOOK ] *NOT THERE* GO NEXT
24 : H2(24) = 6; S2(24) = 9; sep6 = 8 [ > means GO NEXT ]
24 : H3(24) = 8; S3(24) = 0; sep8 = 15 [ < means LOOK ] FOUND
 
27 : H0(27) = 5; S0(27) = 12; sep5 = 3 [ > means GO NEXT ]
27 : H1(27) = 7; S1(27) = 6; sep7 =5 [ > means GO NEXT ]
27 : H2(27) = 9; S2(27) = 9; sep9 = 8 [ > means GO NEXT ]
27 : H3(27) = 0; S3(27) = 0; sep0 = 15 [ < means LOOK ] FOUND
 
36 : H0(36) = 3; S0(36) = 3; sep3 = 9 [ < means LOOK ] FOUND
 
13 : H0(13) = 2; S0(13) = 13; sep2 = 8 [ > means GO NEXT ]
13 : H1(13) = 3; S1(13) = 2; sep3 = 9 [ < means LOOK ] *NOT THERE = NOT FOUND
 
28 : H0(28) = 6; S1(28) = 13; sep6 = 8 [ > means GO NEXT ]
28 : H1(28) = 8; S1(28) = 2; sep8 = 15 [ < means LOOK ] *NOT THERE = NOT FOUND
 
Average chain length = 2.11 for keys present but notice how many times we actually had to look into the record file: for keys present 10/9 = 1.1 Ave. accesses; the missing keys don't change this much 12/11 = 1.09. If we count accesses for the keys not found when we don't use the signatures we get 9 for key(13) and 5 for key(28) which makes the total chain length 33/11 = 3.
 
THE ONLY TIME it will take us > 1 access to retrieve a record or determine that it doesn't exist is when the signatures is equal to the separator, and with a decent signature function, this won't happen often.
 
If we now apply this to a bucket scheme there will be even fewer collisions.
 
Something to ponder: can we apply this idea to linear or cyclical dynamic hashing implementations?


CPSC 461: Copyright © 2002 Katrin Becker 1998-2002 Last Modified November 19, 2001 02:09 PM