461 - Cosequential Processing Review

COSEQUENTIAL PROCESSING REVIEW QUESTIONS

SHORT ANSWER QUESTIONS

[ 4 marks ] Why does the Cascade Merge-Sort outperform the Polyphase Merge-Sort when we have more then 5 devices?

[ 4 marks ] Why is heapsort a good choice for sorting large files?

[2 marks ] What do we normally count when we are comparing sorting algorithms?

[3 marks ] What is the usual measure of efficiency for large file sorts? Explain.

[ 4 marks ] Explain how the heapsort process can overlap with output.

[ 6 marks ] You have a very large file to sort. You have decided to sort it using heap sort and to break it up into some large number of sorted runs, finally merging those runs using a polyphase merge. What do you do if the number of runs is not a fibonacci number? (suggest 3 possible solutions).

[ 4 marks ] What is a Balanced Merge-Sort and in what key way does it differ from a Polyphase Merge-Sort?

[ 2 marks ] What aspect of Quicksort makes it a poor choice for sorting large files?

[ 3 marks ] Why do we ignore seeks for output when counting seeks in a multiphase merge?

LONG QUESTIONS

[10 marks] Suppose we simply wanted to count passes over the data to get a measure of the amount of work being done by a particular sort or merge. Let's say that one complete pass (over all records in the file) is given a value of 100. This would mean that the initial production of sorted runs would have a "pass coefficient" of 100. Passes over the data once it is in memory are not counted.
Calculate and compare the number of passes required to merge 50 runs
A) using a 50-way merge
B) using a multi-phase merge (5 sets of 10 runs)
Explain your results.
[ 15 marks ] The Polyphase Merge-Sort was developed for sorting large files on tape. This algorithm can also be used for efficient sorting of large files on disk. Explain (what makes it efficient; what are the system/ hardware requirements; etc.)
[ 10 marks ] Compare and contrast move mode vs locate mode for buffering I/O. { limited credit will be given to information taken from the course notes or text book - this must be in your own words with more than just a re-wording of the given notes }
(worth 15 total) Which strategy would you use to sort a file of 2 million 1K records on each of the following devices (Justify your answers. If you are comparing more than one approach, explain how you would make your decision.):
a) TAPE (such as DAT)
b) "regular" DISKS; less than 4 drives available
c) "regular" DISKs; more than 4 drives available
d) RAID (5 disk; with striping capability)
e) CD-ROM
(worth 20 total)
Suppose you have a file consisting of 250,000 records. Each record is 1,000 bytes long and you have 1 Megabyte of RAM at your disposal. What will be the cost measured in seeks to sort the file:
a) Using heapsort and a simple K-way merge
b) Using heapsort and a 2-phase merge
c) Using replacement selection sort and a simple K-way merge
d) Using replacement selection sort and a 2-phase merge
(worth 15 total)
How could you adapt the algorithm for doing the simulation of the Fibonacci merge to do an actual merge? One version of the simulation follows. You may use it as a guide if you wish (it's not required).You don't need to write code for this question; just anser the following questions as clearly as possible.
A) What information would go in the device stacks (in the code that follows we currently put an integer indicating the file's size)?
B) How would you handle 'dummy runs'?
C) How would 'do_round' need to be re-designed to actually do the merge? (Code is not necessary in your anser as long as the explanation is clear!! Pictures are good)
Code is at: Code/SetUp.txt