|
Family
Name |
Given
Name(s) |
|
|
|
![]() |

Lecture Section L20
June 27 2003 Time: 120 Minutes
Name:
Student ID:
1. Please answer all questions on the exam paper. IF your
ANSWER IS NOT RIGHT WITH THE QUESTION, MAKE SURE YOU tell me WHERE THE ANSWERS
CAN BE FOUND.
2. Write your name and ID on this page.
3. Answer all questions to the best of your ability.
4. Keep answers short and to the point.
THIS IS AN OPEN BOOK EXAM.
YOU MAY USE ANY BOOKS OR OTHER NOTES YOU FEEL ARE
APPROPRIATE. You may also use: cell-phones,
or any other wireless or other electronic devices.
YOU MAY NOT USE FRIENDS,
|
|
total possible |
your score |
Sub - Total |
|
Question 1 |
15 |
|
|
|
Question 2 |
10 |
|
|
|
Question 3 |
10 |
|
|
|
Question 4 |
10 |
|
|
|
Question 5 |
15 |
|
|
|
TOTAL SCORE |
60 |
|
|

Question
1: (worth 15 total) short (!) answera)
[ 2 marks -bonus ] What is the answer?
b) [ 2 marks ] How
does a Storage Area Network differ from a regular file system managed by a
typical file server?
c)
[ 2 marks ] Why do we need to be concerned about managing space inside a file (why
can't we just let the operating system take care of it?)
d) [ 2 marks ] The NTSC film
standard calls for 24 frames per second while the digital video standard MPEG
calls for 30 frames per second. How is NTSC converted to meet the digital video
standard frame rate?
e)
[ 2 marks ] Explain why 'traditional' pointers (such as char* in C/C++) cannot be used in files.
---see next page for last
part of question 1---
--- see previous page for first parts of question
1---
f)
[ 5 marks ] One of the common threads throughout this course was
that the organization of data touched on similar issues at various levels of
abstraction. Fill in at least five more boxes in the following table:
|
Level of Abstraction |
Highest Level |
Index Equivalent |
One Element |
Subdivision of one Element |
|
|
Within a Record: |
One Record |
|
|
|
|
|
Within a File: |
The Whole File |
Index of Keys & Pointers |
Record |
Field |
Data Item (such as a single char of a string) |
|
Within a File System (such as in a UNIX partition): |
File System |
|
|
|
|
|
Within a Storage Network: |
Network FS |
|
|
File |
|
|
Within an Object Repository: |
The Entire Collection |
|
Object |
|
|
Question
2: (worth 10 total) Cosequential Processing
Estimate (calculate) the cost (in seeks) of creating the
initial runs of a major sort by implementing the first phase of a radix sort,
followed by a merge to produce a single sorted file. Assume it is possible to break up the original file
(using the keys) into 17 roughly equal-sized sets of records. Compare the cost
of this kind or sort-merge against the calculations below for a standard
polyphase sort-merge of 17 sorted runs. Be sure to explain your calculations
(especially important if your numbers are wrong or confusing). It can be
assumed that one uninterrupted, sequential pass through an entire file (or run)
will cost one seek. Compare the two totals: including the cost of creating the
runs and the cost of merging them). Think carefully. You may or may not require
all the information below.
|
For a POLYPHASE Merge of 17 runs: |
|||||
|
|
D1 |
D2 |
D3 |
D4 |
Seeks: |
|
0 |
7[1] |
4[1] |
6[1] |
- |
Phase 0: we don't count seeks here, they
are counted when we create the runs |
|
1 |
3[1] |
- |
2[1] |
4[3] |
Phase 1: 4 sets of 3 runs: : 1/3 of RAM for each = 333,333
Bytes = 333 Records : 1/3 of each run = 3 seeks/run *
3 runs = 9 seeks : 4 sets to do = 9 * 4 = 36 seeks
[36 SEEKS] |
|
2 |
1[1] |
2[5] |
- |
2[3] |
Phase 2: 2 sets of 3 runs (1 run in each
set has 3,000 records in it) : 1/3 RAM = 333 records : 3 seeks/run * 3 runs = 9 seeks,
but last file has 2,000 records left, so 6 more seeks for it = 15 seeks
per set : 2 sets = 2 * 15 = 30 seeks [30
SEEKS] |
|
3 |
- |
1[5] |
1[9] |
1[3] |
Phase 3: 1 set of 3 runs (1 has 1000 rec.,
1 has 3000 rec., 1 has 5000 rec.) : 3 seeks/run * 3 runs = 9 seeks,
but 2 runs have at least 2000 records left, so 6 more seeks for each; 6
* 2 = 12 MORE seeks, but the last run still has 2000
records left so 6 MORE for it : 9 seeks + 12 seeks + 6 seeks =
27 seeks [27 SEEKS] |
|
4 |
1[17] |
- |
- |
- |
Phase 4: 1 set of 3 runs (lengths 9000,
5000, 3000 records) : smallest size is 3000 rec. so
1/3 RAM = 9 seeks/run so 9 seeks/run * 3 runs = 27 seeks : now 2 runs still have at least
2000 records left so 6 seeks for each * 2 of them = 12
seeks : now last run has 4000 records
left so 3*4 = 12 seeks : 27 seeks + 12 seeks + 12 seeks =
53 seeks [53 SEEKS] |
|
36 + 30 + 27 + 53 = 146 SEEKS
TOTAL +
17 seeks to break the file into the initial runs = 163 |
|||||
Radix Sort-Merge Total:____________ Polyphase Sort-Merge Total: 163
seeks
You
can use it if you like.

Feature:
What does
this mean?
Why is this
good?
A difficulty:
A potential
Solution:
Feature:
What does
this mean?
Why is this
good?
A difficulty:
A potential
Solution:

Question 4: (worth 10 total) Assignment Related
The
following are 5 problems encountered by various groups while working on their
projects. For each one,
suggest a solution or means of addressing this problem. Please suggest
concrete, implement-able solutions (“should ‘a ‘been smarter” or “start sooner”
are not acceptable). Assume that acquiring more time, or physical resources
(i.e. faster/better machines, etc.) are not options. Any solutions must be
applicable within the same time frame. An example follows:
Problem: The group acquired code for
reading images/sounds (as opposed to writing it themselves), and assumed they would
be able to make it work with their code on their systems. When they were ready
to start trying out their algorithms, they discovered that they couldn’t get
the acquired code to work.
Solution: - prototyping (Get a
working program up and running as fast as possible. In this case it would have
included writing & testing the “wrapper” code to load images – even before
the algorithms for manipulating those images had been created.)
1.
The group couldn’t
settle on a project (task) to pursue. What constraints could they have devised
to help define their problem space?
2.
Two different
algorithms for image matching were tried – neither produced any acceptable
results. There was insufficient time to try others. How could they have used
what they were discovering to get farther than “tried this – didn’t work”?
3.
Two algorithms were
tried and seemed promising but the group did not have enough sample data.
Images were created by converting existing images to .bmp using “MSPaint” (by
hand). There was insufficient time to convert more in this fashion. How could
they have automated this process?
4.
The group was unable to
find reference information on the ‘Net, even though they knew the information
existed. What if references were found, but the resources referred to were not
on-line?
5.
Group wanted to examine
MP3 files but could not acquire file format info. Could they attempt to extract
information from the files themselves? How?
You
can use it if you like.Suppose we are asked to
design the file structures for a set of files to be used for an extended study
that will be followed by an exhaustive analysis. The project is intended to
study relationships between health and longevity of populations (of people) and
their environment.
The subjects (let's say up to 5 million of them) of this
study are grouped according to 25 major categories (such as urban/rural;
community size; use of electricity; sanitation facilities; typical modes of
travel; mobility; etc.)
The information to be gathered on these populations
includes things like birth/death (and cause), diet, activities, medical records
(including lab results and X-rays), and several thousand other attributes.
The study is to span 25 years with information gathered retroactively
from the first 15 years (using existing information) and on an on-going basis
for the next 10 years. Part of the project involves collecting and organizing
15 years’ worth of existing data. We do not yet know all the questions that
will be asked in the study, and we want to leave the possibility for
un-targeted data mining available to us.
List
and discuss 3 (three) major considerations [for 5 marks each] that are
crucial in this design (An Example: Should we use
standard file formats whenever possible? If we do, current software will exist
to manipulate it but we may have to be prepared to support that software
ourselves or convert the files to new formats as time passes and older formats
are 'discontinued'. If we don't, we may have difficulty with portability; there
will be potentially more overhead in training people to work with the data as
well as in the initial set-up costs).

------this
page intentionally left blank------