CPSC 461: Copyright © 2002 Katrin Becker 1998-2002 Last Modified July 31, 2000 04:23 PM 

Cosequential Processing Examples

6th Order Tables for:
Cascade Merge
0 1 2 3 4 5 6 total
1 1 1 1 1 1 1 6
2 6 5 4 3 2 1 21
3 21 20 18 15 11 6 91
4 91 85 74 59 41 21 371
5 371 350 309 250 176 91 1547
6 1547 1456 1280 1030 721 371 6405
 
Polyphase Merge
0 1 2 3 4 5 6 total
1 1 1 1 1 1 1 6
2 2 2 2 2 2 1 11
3 4 4 4 4 3 2 21
4 8 8 8 7 6 4 41
5 16 16 15 14 12 8 81
6 32 31 30 28 24 16 161
7 63 62 60 56 48 32 321
8 125 123 119 111 95 63 636
9 248 244 236 220 188 125 1261
10 492 484 468 436 373 248 2501
11 976 960 928 865 740 492 4961
 
Calculating Seeks....
 
RAM available: 20 MBytes (20,000,000)
 
20 Million Records @ 2000 Bytes/Record [ so we can fit 10,000 records in RAM at 1 time]
 
Straight Heapsort produces 2000 runs of 10,000 records per run.
 
Time to split into 2000 sorted runs:
2000 seeks in + 2000 seeks out
= 4000 seeks @ 3 ms = 4000 * 3/1000 = 12 seconds
[ NOTE: this is seek time only, not latency or transfer ]
 
Regular K-Way Merge:
: 2000 runs; room for 5 records from each run in RAM
= 1/2000 of each run in memory at a time
= 2000 seeks to read in all of one run; there are 2000
of them so 2000 * 2000 = 4,000,000 seeks
: 4,000,000 * 3 ms = 12,000 seconds = 200 minutes (> 3 hr.)
3 Hours, 20 Minutes to do the merge [seek time only]
 
2-Step Polyphase Merge:
: Break it up into 50 sets of 40 runs each
: To merge 40 runs:
- 1/40th of RAM for each run = 500,000 Bytes
- room for 500,000 Bytes / 2000 Bytes/Record = 250 records from each run
- 1 run has 10,000 records so 250/10,000 = 1/40th of a run
- can read 1/40th of a run with 1 seek so 40 seeks to read one run
- there are 40 of them so 40 * 40 = 1600 seeks to do 1 set
- there are 50 sets to do so 50 * 1600 = 80,000 seeks
 
80,000 seeks for phase 1
 
: To merge 50 runs (each 400,000 Records long - 40*10,000)
- 1/50th of RAM for each run = 400,000 Bytes
- room for 400,000 Bytes / 2000 Bytes/Record = 200 records from each run
- 1 run has 400,000 records so 200/400,000 = 1/2000th of a run
- can read 1/2000th of a run with 1 seek; 2000 seeks to read 1 run
- there are 50 of them so 50 * 2000 = 100,000 seeks for phase 2
100,000 seeks for phase 2
 
180,000 seeks * 3ms = 540 sec. = 9 minutes
9 Minutes to do the merge [cool]
 
What about Polyphase Merge? How to calculate seeks for this?
 
Let's start simple....
say, 17 runs 3rd order merge;
1,000 records/run; 1,000 bytes/record; 1 MB RAM
Amount in square brackets indicates relative size of run; [1] = 1,000 records
 
Phases: 1 2 3 4  
0 7[1] 6[1] 4[1] 0 initial distribution
1 3[1] 2[1] 0 4[3]  
2 1[1] 0 2[5] 2[3]  
3 0 1[9] 1[5] 1[3]  
4 1[17] 0 0 0  
 
Phase 0: Initial Distribution which can be done when runs are first created.
 
Phase 1: Merge 4 sets of 3 runs
 
Phase 2: Merge 2 sets of 3 runs this way:
: same as above but we 2000 records more in one of the runs
(2 "run lengths") which means 1 run will require more seeks to merge than the other 2
 
Phase 3: Merge 1 set of 3 runs this way:
: merge 1 "run length" 3 ways, then
: merge 2 "run lengths" 2 ways, then
: merge 1 "run length" 1 way
( this does not happen in chronological order but amounts to the same number of seeks for each run in the end)
 
Phase 4: Merge 1 set of 3 runs this way:
: merge 3 "run lengths" 3 ways, then
: merge 2 "run lengths" 2 ways, then
: merge 4 "run lengths" 1 way
 
Now, to count seeks....
 
Phase 0: we don't count seeks here, they are counted when we create the runs
 
Phase 1: 4 sets of 3 runs:
: 1/3 of RAM for each = 333,333 Bytes = 333 Records
: 1/3 of each run = 3 seeks/run * 3 runs = 9 seeks
: 4 sets to do = 9 * 4 = 36 seeks [36 SEEKS]
 
Phase 2: 2 sets of 3 runs (1 run in each set has 3,000 records in it)
: 1/3 RAM = 333 records
: 3 seeks/run * 3 runs = 9 seeks, but last file has 2,000 records left,
so 6 more seeks for it = 15 seeks per set
: 2 sets = 2 * 15 = 30 seeks [30 SEEKS]
 
Phase 3: 1 set of 3 runs (1 has 1000 rec., 1 has 3000 rec., 1 has 5000 rec.)
: 3 seeks/run * 3 runs = 9 seeks, but 2 runs have at least 2000 records
left, so 6 more seeks for each; 6 * 2 = 12 MORE seeks, but
the last run still has 2000 records left so 6 MORE for it
: 9 seeks + 12 seeks + 6 seeks = 27 seeks [27 SEEKS]
 
Phase 4: 1 set of 3 runs (lengths 9000, 5000, 3000 records)
: smallest size is 3000 rec. so 1/3 RAM = 9 seeks/run so
9 seeks/run * 3 runs = 27 seeks
: now 2 runs still have at least 2000 records left so
6 seeks for each * 2 of them = 12 seeks
: now last run has 4000 records left so 3*4 = 12 seeks
: 27 seeks + 12 seeks + 12 seeks = 53 seeks [53 SEEKS]
 
36 + 30 + 27 + 53 = 146 SEEKS TOTAL
 
Compare with 17-Way Merge:
1/17th of RAM for each run =~ 58,823 Bytes
= 58 records from each run
= 1/17th of a run
: 17 seeks for each run * 17 runs = 289 seeks
 
BUT!!!!
 
The big advantage of the Polyphase Merge is that we are using >1 device so we
can read simultaneously from each device, so.....
(if 3 devices serve as sources for runs we can do 3 seeks at the same time)
 
Phase 1: 4 sets of 3 runs
: 1/3 RAM for each = 333 records
: 3 seeks/ run (read concurrently) = 3 seeks
: 4 sets to do so 4 * 3 = 12 seeks [12 SEEKS]
 
Phase 2: 2 sets of 3 runs (1 has 3000 records in it)
: still 333 records from each run with 1 seek
: 3 seeks/run (read concurrently) = 3 seeks BUT
last run has 2000 records left so 6 more seeks
: = 9 seeks [9 SEEKS]
 
Phase 3: 1 set of 3 runs (1000, 3000, 5000 records respectively)
: 3 seeks/run = 3 seeks, but 2 runs have at least 2000 records left
so 6 more for each (read concurrently) = 6 MORE seeks, but
last run still has 2000 records left so 6 more for it
: 3 seeks + 6 seeks + 6 seeks = 15 seeks [15 SEEKS]
 
Phase 4: 1 set of 3 runs, lengths 9000, 5000, 3000
: smallest size is 3000, so 1/3 RAM = 9 seeks/run (read concurrently)
: now we still have at least 2000 records left in 2 runs, so
6 seeks for each (read concurrently)
: AND last run still has 4000 records left so 4 * 3 = 12 seeks
: 9 seeks + 6 seeks + 12 seeks = 27 seeks [27 seeks]
 
12 + 9 + 15 + 27 = 63 SEEKS for Polyphase Merge
 
Compare with Cascade:
3rd order table:
0 1 2 3 total
1 1 1 1 3
2 3 2 1 6
3 6 5 3 14
4 14 11 6 31
 
There is no 'perfect' distribution, so save one set of 3 and merge the other 14 runs
 
Initial Distribution:
- 1 2 3 4  
0 6[1] 5[1] 3[1] 0 {merge 3 sets of 3 concurrently = 9 seeks}
1a 3[1] 2[1] 0 3[3]
{merge 2 sets of 2 concurrently
from device 1 & 2; leave 4 alone
= 4 seeks}
1b 1[1] 0 2[2] 3[3] {merge 1 set of 1000,2000,3000 = 9 seeks}
2a 0 1[6] 1[2] 2[3]
{merge 1 set of 2000,3000 from device 3 & 4;
leave 2 alone = 9 seeks}
2b 1[5] 1[6] 0 1[3] {now merge 1 set of 3000,5000,6000 = 18 seeks}
3 0 0 1[14] 0 Merge the 3 we were holding: 3 seeks = 1[3]
 
Merge 1[3] with 1[14] = 42 seeks (takes as many seeks as the longest run)
9 seeks + 4 seeks + 9 seeks + 18 seeks + 3 seeks + 42 seeks = 85 seeks
 
In this case this arrangement is worse than Polyphase merge but still better than 17-Way Merge.
 
There may be a better way to arrange the initial distribution (maybe with replacement selection heapsort to build the runs we can end up with 14 slightly longer runs).
 
In general, the decision about which Merge algorithm is best will depend on at LEAST the following:
1. number of devices available
2. amount of RAM at your disposal
3. number of initial runs
4. the blocking factor (how many records can you be guaranteed to get with one seek). Remember, you don't ever want to read part of a record and you can never read part of a block to save seeks.


Back to Top
CPSC 461: Copyright © 2002 Katrin Becker 1998-2002 Last Modified July 31, 2000 04:23 PM