DEPARTMENT OF COMPUTER SCIENCE

THE UNIVERSITY OF CALGARY

CPSC 461 Spring 2003

FINAL EXAMINATION

 

June 27 2003

8:00-10:00 MFH 164

Time: 120 Minutes

 

 

Family Name

Given Name(s)

 

 

 



NOTE: Many questions have > 1 correct answer and in some cases full marks will be given for answers that while not perfect, are still 'good enough'.
DEPARTMENT OF COMPUTER SCIENCE

THE UNIVERSITY OF CALGARY

            COMPUTER SCIENCE 461

FINAL EXAMINATION

Lecture Section L20

June 27 2003                    Time: 120 Minutes

Name:                

Student ID:             

 

INSTRUCTIONS

1.  Please answer all questions on the exam paper. IF your ANSWER IS NOT RIGHT WITH THE QUESTION, MAKE SURE YOU tell me WHERE THE ANSWERS CAN BE FOUND.

2. Write your name and ID on this page.

3. Answer all questions to the best of your ability.

4. Keep answers short and to the point.

 

THIS IS AN OPEN BOOK EXAM.

YOU MAY USE ANY BOOKS OR OTHER NOTES YOU FEEL ARE APPROPRIATE. You may also use: cell-phones, or any other wireless or other electronic devices.

YOU MAY NOT USE FRIENDS,

 

 

total possible

your score

Sub - Total

Question 1

15

 

 

Question 2

10

 

 

Question 3

10

 

 

Question 4

10

 

 

Question 5

15

 

 

TOTAL SCORE

60

 

 

 



Question 1: (worth 15 total) short (!) answer

a)   [ 2 marks -bonus ] What is the answer?

b)  [ 2 marks ] How does a Storage Area Network differ from a regular file system managed by a typical file server?

c)   [ 2 marks ] Why do we need to be concerned about managing space inside a file (why can't we just let the operating system take care of it?)

d)  [ 2 marks ] The NTSC film standard calls for 24 frames per second while the digital video standard MPEG calls for 30 frames per second. How is NTSC converted to meet the digital video standard frame rate?

e)   [ 2 marks ] Explain why 'traditional' pointers (such as char* in C/C++) cannot be used in files.

 

---see next page for last part of question 1---

--- see previous page for first parts of question 1---

 

f)     [ 5 marks ] One of the common threads throughout this course was that the organization of data touched on similar issues at various levels of abstraction. Fill in at least five more boxes in the following table:

Level of Abstraction

Highest Level

Index Equivalent

One Element

Subdivision of one Element

 

Within a

Record:

One Record

 

 

 

 

Within a File:

The Whole File

Index of Keys & Pointers

Record

Field

Data Item (such as a single char of a string)

Within a File System (such as in a UNIX partition):

File System

 

 

 

 

Within a Storage Network:

Network FS

 

 

File

 

Within an Object Repository:

The Entire Collection

 

Object

 

 

 


Question 2: (worth 10 total) Cosequential Processing

Estimate (calculate) the cost (in seeks) of creating the initial runs of a major sort by implementing the first phase of a radix sort, followed by a merge to produce a single sorted file. Assume it is possible to break up the original file (using the keys) into 17 roughly equal-sized sets of records. Compare the cost of this kind or sort-merge against the calculations below for a standard polyphase sort-merge of 17 sorted runs. Be sure to explain your calculations (especially important if your numbers are wrong or confusing). It can be assumed that one uninterrupted, sequential pass through an entire file (or run) will cost one seek. Compare the two totals: including the cost of creating the runs and the cost of merging them). Think carefully. You may or may not require all the information below.

 

For a POLYPHASE Merge of 17 runs:

 

D1

D2

D3

D4

Seeks:

0

7[1]

4[1]

6[1]

-

Phase 0: we don't count seeks here, they are counted when we create the runs

1

3[1]

-

2[1]

4[3]

Phase 1: 4 sets of 3 runs:

: 1/3 of RAM for each = 333,333 Bytes = 333 Records

: 1/3 of each run = 3 seeks/run * 3 runs = 9 seeks

: 4 sets to do = 9 * 4 = 36 seeks [36 SEEKS]

2

1[1]

2[5]

-

2[3]

Phase 2: 2 sets of 3 runs (1 run in each set has 3,000 records in it)

: 1/3 RAM = 333 records

: 3 seeks/run * 3 runs = 9 seeks, but last file has 2,000 records left,

so 6 more seeks for it = 15 seeks per set

: 2 sets = 2 * 15 = 30 seeks [30 SEEKS]

3

-

1[5]

1[9]

1[3]

Phase 3: 1 set of 3 runs (1 has 1000 rec., 1 has 3000 rec., 1 has 5000 rec.)

: 3 seeks/run * 3 runs = 9 seeks, but 2 runs have at least 2000 records

left, so 6 more seeks for each; 6 * 2 = 12 MORE seeks, but

the last run still has 2000 records left so 6 MORE for it

: 9 seeks + 12 seeks + 6 seeks = 27 seeks [27 SEEKS]

4

1[17]

-

-

-

Phase 4: 1 set of 3 runs (lengths 9000, 5000, 3000 records)

: smallest size is 3000 rec. so 1/3 RAM = 9 seeks/run so

9 seeks/run * 3 runs = 27 seeks

: now 2 runs still have at least 2000 records left so

6 seeks for each * 2 of them = 12 seeks

: now last run has 4000 records left so 3*4 = 12 seeks

: 27 seeks + 12 seeks + 12 seeks = 53 seeks [53 SEEKS]

36 + 30 + 27 + 53 = 146 SEEKS TOTAL  + 17 seeks to break the file into the initial runs = 163

 

Radix Sort-Merge Total:____________                Polyphase Sort-Merge Total: 163 seeks

 

 

 

 

 

 

 

 

 

 


This page left blank. You can use it if you like.
Question 3: (worth 10 total, 5 each) Digital Libraries

Choose any two (2) of the following features of a digital library, and explain what it means in the context of digital libraries and why it is beneficial. Then describe one difficulty with the implementation of this feature in the context of digital libraries and outline a potential solution. Keep your answers short (remember, each part is worth 1 mark).


 


  1. Accessible via web browser.
  2. Runs on multiple, popular platforms.
  3. Permits full-text and fielded search.
  4. Offers flexible browsing techniques.
  5. Creates access structures automatically.
  6. Makes use of available metadata.
  7. Capabilities can be extended by plug-ins.
  8. Can handle documents in any language.
  9. Can display user interface in multiple languages.
  10. Can handle collections of text, pictures, audio, video.
  11. Allows hierarchical browsing.
  12. Designed for multi-gigabyte collections.
  13. Uses compression techniques.
  14. Permits authentication of users.
  15. Offers user logging.
  16. Provides an administrative function.
  17. Updates and adds new collections dynamically.
  18. Publishes collections on CD-ROM.
  19. Supports distributed collections.


Feature:

What does this mean?

Why is this good?

A difficulty:

A potential Solution:

Feature:

What does this mean?

Why is this good?

A difficulty:

A potential Solution:


 

 

 

 

 

 

 

 

 

 

 

 


Question 4: (worth 10 total) Assignment Related

The following are 5 problems encountered by various groups while working on their projects. For each one, suggest a solution or means of addressing this problem. Please suggest concrete, implement-able solutions (“should ‘a ‘been smarter” or “start sooner” are not acceptable). Assume that acquiring more time, or physical resources (i.e. faster/better machines, etc.) are not options. Any solutions must be applicable within the same time frame. An example follows:

Problem: The group acquired code for reading images/sounds (as opposed to writing it themselves), and assumed they would be able to make it work with their code on their systems. When they were ready to start trying out their algorithms, they discovered that they couldn’t get the acquired code to work.

Solution: - prototyping (Get a working program up and running as fast as possible. In this case it would have included writing & testing the “wrapper” code to load images – even before the algorithms for manipulating those images had been created.)

1.      The group couldn’t settle on a project (task) to pursue. What constraints could they have devised to help define their problem space?

2.      Two different algorithms for image matching were tried – neither produced any acceptable results. There was insufficient time to try others. How could they have used what they were discovering to get farther than “tried this – didn’t work”?

3.      Two algorithms were tried and seemed promising but the group did not have enough sample data. Images were created by converting existing images to .bmp using “MSPaint” (by hand). There was insufficient time to convert more in this fashion. How could they have automated this process?

4.      The group was unable to find reference information on the ‘Net, even though they knew the information existed. What if references were found, but the resources referred to were not on-line?

5.      Group wanted to examine MP3 files but could not acquire file format info. Could they attempt to extract information from the files themselves? How?

 

This page left blank. You can use it if you like.
Question 5: (worth 15 total) Comprehensive


Suppose we are asked to design the file structures for a set of files to be used for an extended study that will be followed by an exhaustive analysis. The project is intended to study relationships between health and longevity of populations (of people) and their environment.

            The subjects (let's say up to 5 million of them) of this study are grouped according to 25 major categories (such as urban/rural; community size; use of electricity; sanitation facilities; typical modes of travel; mobility; etc.)

            The information to be gathered on these populations includes things like birth/death (and cause), diet, activities, medical records (including lab results and X-rays), and several thousand other attributes.

            The study is to span 25 years with information gathered retroactively from the first 15 years (using existing information) and on an on-going basis for the next 10 years. Part of the project involves collecting and organizing 15 years’ worth of existing data. We do not yet know all the questions that will be asked in the study, and we want to leave the possibility for un-targeted data mining available to us.

            List and discuss 3 (three) major considerations [for 5 marks each] that are crucial in this design (An Example: Should we use standard file formats whenever possible? If we do, current software will exist to manipulate it but we may have to be prepared to support that software ourselves or convert the files to new formats as time passes and older formats are 'discontinued'. If we don't, we may have difficulty with portability; there will be potentially more overhead in training people to work with the data as well as in the initial set-up costs).



------this page intentionally left blank------