CPSC 461: Copyright © 2002 Katrin Becker 1998-2002 Last Modified October 21, 2001 01:12 PM

GENERAL REVIEW QUESTIONS

SHORT ANSWER QUESTIONS

  1. [ 5 marks ] The Fast String Searching Algorithm checks the last character of the pattern against the one in the search string and then if they are different looks for STRING[i] in the pattern to determine the next alignment (of pattern with string). How can we do this (checking STRING[i] against all characters in PATTERN) without actually doing len(PATTERN) comparisons?
  2. [ 3 marks ] In the 'Paging Game', what is it called when you give the number of a thing? What happens to the thing if you don't do this?
  3. [ 2 marks ] Under what conditions does efficiency (of code; access to data; etc.) become at least as important as reusability and maintainability in a program?
  4. [ 4 marks ] Name 2 reasons why the use of builtin utilities on UNIX is better than writing your own code. Explain them.

LONG QUESTIONS

  1. (worth 10 total) Given a choice between using extendible hashing and a B+ tree, what factors about the problem at hand would govern your choice of an approach?

  2. (worth 36 total) The following is a list of file structures as examined in class. They are not necessarily mutually exclusive. Assume that all are 'keyed' in the sense that all contain records that are uniquely identified by keys. All files hold 1,000,000 variable length data records, where it is assumed that the average record length is 100 bytes, the maximum record length is 50 bytes. For simplicity, assume that data records are not blocked.
    Fill in the table below by writing numerical values wherever possible. Where it is not possible to give numerical values, give a formula. If neither is possible, explain why.


  3. (worth 10 total) Given the following approaches, identify which principle(s) from "The Book of Rules" is(are) being applied.
    a)Keysort
    b) Signature Hashing
    c) Dynamic Hashing
    d) Tries
    e) Cascade Merge-Sort
    The File Architect's Book of Rules

    1. If it looks like too much work, try and find a simpler way.
    2. Store once, process often.
    3. Make it faster by doing less.
    4. Compute information when storage is limited (e.g. using address links rather than actual addresses in computed chaining)
    5. Store only information that needs to be accessed and cannot be readily computed (i.e. if we have a date of birth we don't need to store age).
    6. Spend additional time structuring information prior to its use to reduce subsequent processing time (e.g. inverted files)
    7. Subdivide the search space so only a portion needs to be considered when searching (classification hashing; B*Trees).
    8. Use a separate data structure to control organization of data in a storage structure (e.g. many statistical and dictionary methods of data compression; tournament trees for merge)
    9. Keep some condensed information in primary memory to improve access time to the actual data (e.g. maintaining bit strings to identify occupied locations for linear quotient insertions)
    10. Order data to improve search efficiency (e.g. binary search).
    11. Consider storing information according to different characteristics.
    12. Use a level of indirection to gain flexibility in accessing information (e.g. Grid Files)
    13. If the information is not readily accessible for the task at hand in a given form, then convert it to something more convenient.
    14. Use additional storage to save processing time (text signatures; Bloom Filters)
    15. Search an encoded form of the information rather than the actual information (e.g. signature hashing; Bloom Filter)
    16. Eliminate unnecessary decisions (e.g. Patricia Trees)


  4. (worth 20 total) Given the following pairs of approaches to File Organization, identify the tradeoffs involved (i.e. what do you gain by using one rather than the other and what is the cost?)
    a) B-Trees vs. Extended Hashing
    b) Signature Hashing vs. Static Hashing (loaded using Brent's Method)
    c) Compressed vs. Uncompressed files for documents
    d) B+Trees vs. Dynamic Hashing
    e) Multi-Indexed Files vs. Ring Files

  5. (worth 25 total) You are given the task of designing a file system for a large bookstore chain. They need to keep track of inventory and sales. The system must allow multi-user access. By far the majority of their inventory consists of books but they also sell a few other items (packaged software, audio tapes/CDs, some stationery). Give an overview of the file system design – including what hardware you would recommend. Describe the record structure you would use, whether there would be one or many files and how they would be interrelated. Explain how information will be accessed/indexed and any special features that would be included. Use diagrams, written explanations or pseudocode as appropriate. Include major issues/questions that need to be addressed before the design could be finalized.

  6. (worth 20 total)
    Name and explain one key factor affecting choices about file design (exclude references to hardware) for each of the following applications:
    1. Games
    2. Medical Records for a large hospital
    3. Astronomical Image Processing System
    4. An Internet Database
    5. A Groupware System
    6. A Large Website (> 5,000 pages)
    7. A Single User PC File System
    8. On-Line Oxford Dictionary
    9. Archival System for photos for a Fashion Magazine
    10. A Newspaper Publisher

  7. (worth 20 total)
    You are given the task of designing the file structures for an interactive storybook series. Each book resides on it's own CD but must fit on a single CD. "Books have between 20-50 pages and there are up to 100 "hotspots" on each page (regions that when clicked with the mouse cause some activity, sound or animation) The kinds of things activated by hot spots may be:
    1. short animation (eg. the gopher dances and sings a song when clicked) lasting 5-20 seconds (Note: this animation need not be located near the mouse click). Some involve small parts of the current picture but others involve the entire picture.
    2. a question may be posed requiring input from the user (remember this is aimed at small children)
    In addition to the user-activated animation, there will be a narration of the story that is invoked automatically when the page is 'opened' which includes the accompanying text being highlighted (larger/different colour), and may include other animation/sounds on the page.
    It must be possible to move forwards, and backwards through pages one by one; go to the end/beginning; choose a specific page to go to
    Total amount of animation/sound amounts to about 2 hours per story.
    Files must well organized, and efficiently accessed.
    Question: Describe (using pictures if necessary) the file structures you would use; how they would be interrelated; what and how much of the CD would need to be loaded onto the hard-drive (and what will stay on the CD) Justify all your decisions.

  8. (worth 25 total)
    Suppose you are trying to compare some number N of similar images against each other, where N ≥ 5. Each image is 1K X 1K bytes (1 byte/pixel) uncompressed. You have space for only 3 (complete) images in memory at one time. You need to find the median value for each pixel and then generate a new composite image that is made of just the median values. Efficiency is a concern and you want to try and avoid spending all your time opening and closing files. What should you do?

  9. (worth 20 total)
    You have been given the task of designing an efficient spell-checker. You want a design that will avoid having to search each string from your document(s) in a sequentially ordered list of words as this is too slow. We need to be able to maintain multiple dictionaries separately. We need the option of searching as many (1, some, or all) of the dictionaries as required each time, but we don’t want to simply combine them into one great big list. Dictionaries may be modified. What kind of format(s) would you use for the dictionaries? Keep in mind that words typically do not exceed 20 characters so the overhead involved with a linked structure may not always be worth it (since each link will cost at least 4 additional bytes). Design the file structure(s) for this spell-checker. Justify all decisions. Diagrams and pseudocode welcome.


Back to TopCPSC 461: Copyright © 2002 Katrin Becker 1998-2002 Last Modified October 21, 2001 01:12 PM