461 - Hashing Review

HASHING REVIEW QUESTIONS

SHORT ANSWER QUESTIONS

[ 4 marks ] How does Signature Hashing ensure single seek file access?
[ 4 marks ] Why is deletion of keys/records from a hash table loaded using open addressing so problematic?
[ 4 marks ] What determines the bit-size of separators in the separator table when implementing Signature Hashing?
[ 3 marks ] What does the term 'cellar' refer to when applied to hash tables?
[ 4 marks ] How does 2-pass loading affect the distribution of records in a hashed file?
[ 4 marks ] How does chaining rather than progressive overflow affect performance in file retrieval of records in a hashed file?
[ 4 marks ] What is the primary feature that distinguishes extendible hashing from dynamic hashing?
[ 4 marks ] In a hashed file that uses Synonym Chaining to resolve conflicts, how do you solve the problem of always having to be able to find a key by starting at its home address? How does it solve the problem?
[ 4 marks ] What is a trie?
[ 2 marks ] What is a perfect hashing function?
[ 2 marks ] What is the major requirement in a file?s behaviour that would make the use of hashing a reasonable option?
[ 4 marks ] What determines the bit-size of separators in the separator table when implementing Signature Hashing?

LONG QUESTIONS

(worth 10 total)
When implementing a dynamic linear hashing algorithm for an address space consisting of N buckets, how many overflow buckets do we need to have in reserve? Explain (annotated diagrams are acceptable)
(worth 15 total)
Choose 3 (three) collision resolution techniques used in loading hash tables and describe a situation for each where you would/ would NOT use it.

(worth 25 total)
Using the following simple hashing and incrementing function, insert the given list of keys into the table using Computed Chaining for collision resolution IN TWO DIFFERENT WAYS. First, insert the keys in order as they appear in the list. Then do it again using a 2-pass load. 'Re-draw' the table as necessary to keep moved keys visible, show calculations and explain movement of keys. When both tables are complete, list the chain lengths for each key (single and 2-pass); compare and comment.
Record Keys:14,51,37, 18,16,25, 36, 17,11,27
Table Size = 11; Hash Function = hash(key) = key mod 11
Incrementing Function = i(key) = Quotient (Key / 11) mod 11

#	14	51	37	18	16	25	36	17	11	27
hash(#)
i(#)

probe count	14	51	37	18	16	25	36	17	11	2
single-pass
2-pass load

(worth 25 total)
Using the following simple hashing and incrementing function, insert the given list of keys into the table using Brent?s Method for collision resolution. ?Re-draw? the table as necessary to keep moved keys visible, show calculations and explain movement of keys. Compute and show the s value for each record that can?t be stored at it?s home (or immediate next) address and the table of i,j values for checking for possible moves.
Record Keys:14,51,37, 18,16,25, 36, 47,15,27
Table Size = 11; Hash Function = hash(key) = key mod 11
Incrementing Function = i(key) = Quotient (Key / 11) mod 11

#	14	51	37	18	16	25	36	47	15	27
hash(#)
i(#)

For each number that might be better moved; compute the s value and step through the i,j increments in order stating which key should be moved, where it would be moved to, and how many table accesses would be involved. When you find a viable move, state how many extra accesses there are for the value being moved, how many accesses remain for the incoming value and what the net gain is.

(worth 25 total)
Insert the given keys into the table provided in TWO different ways:
1. using synonym chaining
2. using coalesced hashing
Compare your results in terms of amount of work to load the tables as well as the average chain lengths. What are your conclusions?
(worth 10 total)
When used in Extendible Hashing, tries are normally made by splitting buckets as they fill. The hash function used gives us the address of the root of a particular trie, where each 'node' (leaf) in the structure consists of a bucket containing the actual records. Given that we know the tries for each bucket in our application may grow quite deep, suggest an alternate approach for extending the address space (while keeping the same number of initial addresses - i.e. using a dynamic addressing scheme is not an option). Justify your answer.
(worth 10 total)
The Binary Tree Insertion Method of Collision Resolution in hash tables is quite complicated and may result in the movement of a LOT of keys (records) by the time the entire table is loaded. Why bother? In other words, why implement such a complicated and time-consuming algorithm to load a hash table? Under what circumstances might you choose to implement this algorithm rather than a simpler method?
(worth 25 total)
Examine the following code segments for doing Binary Tree Insertion into a Hash Table. The trees built using Binary Tree Insertion tend to have many duplicated branches. Suggest a way to modify the algorithm so that redundant branches are NOT extended farther than necessary. Pseudo-code and other explanations (including annotated pictures) are acceptable. Actual code is acceptable only if properly explained.

void initnode ( node* N, int sival, int isrc, node* p1 )
{
   // initialize a new node

  N->S = step(N->K);
  N->Si = sival;
  N->rchild = isrc;
  N->parent = p1;
  N->sibling = NULL;
  N->left = NULL;
  N->right = NULL;

  return;
} // initnode

//-------------------------------------------------------//
void deltree ( node* P )
{
  // Done: release the tree

  if (P->left != NULL)
  {
    deltree( P->left );
    delete P->left;
  }
  if (P->right != NULL)
  {
    deltree( P->right );
    delete P->right;
  }
  delete P;
  return;
} // deltree

//-------------------------------------------------------//
void placebti( int key )
{
  // collision resolution technique: Binary Tree Insertion
  //
  // The tree represents a series of moves to be made to try
  // and minimize the average length of the probe chains.
  // Tree is used as a decision tree
  //
  // don't need to do regular traversals
  // do need the ability to do a breadth-first traversal

  int homeaddr;

  int levels = 1;
  node* root; // usual use
  node* incoming; // the key being placed
  node* parent; // nearest parent to current level (N)
  node* first; // first node at this level (N+1)
  node* newnode; // new node just created (level N+1)
  node* last; // most recently created node before newnode

  // check for available home address
  homeaddr = hash( key );
  if (Table[homeaddr] == EMPTY)
  {
    Table[homeaddr] = key;
  }
  else // Starting to build the tree
  {

    incoming = new(node);
    incoming->K = key;
    incoming->loc = homeaddr;
    initnode( incoming, 0, YES, NULL );

    root = new(node);
    // what's at 'key's home address?
    root->K = Table[incoming->loc];
    root->loc = incoming->loc;
    initnode( root, incoming->S, YES, incoming );

    // filling in the tree
    parent = root;
    first = NULL;
    last = NULL;

    // keep going till we find an empty address
    while ( parent->K != EMPTY )
    {
      // build the left child
      newnode = new(node);
      parent->left = newnode;

      // check for begining of level N+1
      if (first == NULL)
      {
        first = newnode;
        last = NULL; // beginning of new level has no last
        // must do this to prevent circular references
       levels++;
      }
      // get next position in probe chain
      newnode->loc = next( parent->loc, parent->Si );

      newnode->K = Table[newnode->loc];
      initnode( newnode, parent->Si, NO, parent );
      // check if we have a sibling to our left
      if ( last != NULL )
        last->sibling = newnode;
      last = newnode;

      // build right child
      if (newnode->K != EMPTY)
      {
        newnode = new(node);
        parent->right = newnode;
        // get next position in probe chain
        newnode->loc = next( parent->loc, parent->S );

        newnode->K = Table[newnode->loc];
        initnode( newnode, parent->S, YES, parent );
        // MUST have left sibling
        if ( last != NULL )
          last->sibling = newnode;
        last = newnode;

        // check to see if we're done...
        if (newnode->K == EMPTY)
          break;
        else
        {
          // get next parent (this level)to work on or go to next level
          if (parent->sibling != NULL)
            parent = parent->sibling;
          else // next level
          {
            parent = first;
            last = NULL;
            first = NULL;
          }
        } // checking if we're done
      } // build right child

      else // no need means we're done
        break;

    } // while not empty

    // Moving the Keys PART 2
    // 'last' is the node which references the empty slot in the Table
    // if it is a right node then move the key@parent into the
    // specified location
    // if it is a left node copy the current location given by 'last'
    // into the parent

    while ( last != root )
    {
      if (last->rchild)
      {
        // do the move
        Table[last->loc] = last->parent->K;
      }
      else
      {
        // copy current into parent
        last->parent->loc = last->loc;
      }
      // back up to previous level
      last = last->parent;
    }

    // place incoming key once we get to the root
    Table[root->loc] = key;

    // Done - release tree
    deltree( root );
    delete incoming;

  } // end collision

  return;
} // placebti
(worth 20 total)
Suppose we are implementing an extendible (bucket) hashing scheme. Each bucket holds ‘b’ records. The current arrangement is as follows:

a) [5 marks] What would be the result (i.e. how would the structure change) if after a deletion from bucket B the total number of records in B and C <= ‘b’. Explain.
b) [5 marks] What would be the result if after a deletion from bucket C the total number of records in C and D <= ‘b’. Explain.
c) [10 marks] Write the pseudo code (or explain the algorithms) required to ‘grow’ and ‘shrink’ the extended table. (remember that all records in this picture hash to the same address).

(worth 14 total)
Using the following simple hashing and incrementing function, insert the given list of keys into the table using Binary Tree Insertion for collision resolution. 'Re-draw' the table as necessary to keep moved keys visible, show calculations and explain movement of keys. 'Compute' and show the decision tree for each record that can't be stored at it's home (or immediate next) address
Record Keys: 58,51,37, 18,38,25, 36, 47,15,27
Table Size = 11; Hash Function = hash(key) = key mod 11
Incrementing Function = i(key) = Quotient (Key / 11) mod 11

#	58	51	37	18	38	25	36	47	15	27
hash(#)
i(#)

Once a suitable empty location has been found, state, in order which keys are to be moved and where they should go.