Cascade and Polyphase Examples

CPSC 461: Copyright © 2002 Katrin Becker 1998-2002 Last Modified July 31, 2000 04:23 PM

Cosequential Processing Examples

6th Order Tables for:

Cascade Merge

0	1	2	3	4	5	6	total
1	1	1	1	1	1	1	6
2	6	5	4	3	2	1	21
3	21	20	18	15	11	6	91
4	91	85	74	59	41	21	371
5	371	350	309	250	176	91	1547
6	1547	1456	1280	1030	721	371	6405

Polyphase Merge

0	1	2	3	4	5	6	total
1	1	1	1	1	1	1	6
2	2	2	2	2	2	1	11
3	4	4	4	4	3	2	21
4	8	8	8	7	6	4	41
5	16	16	15	14	12	8	81
6	32	31	30	28	24	16	161
7	63	62	60	56	48	32	321
8	125	123	119	111	95	63	636
9	248	244	236	220	188	125	1261
10	492	484	468	436	373	248	2501
11	976	960	928	865	740	492	4961

Calculating Seeks....

RAM available: 20 MBytes (20,000,000)

20 Million Records @ 2000 Bytes/Record [ so we can fit 10,000 records in RAM at 1 time]

Straight Heapsort produces 2000 runs of 10,000 records per run.

Time to split into 2000 sorted runs:

2000 seeks in + 2000 seeks out
= 4000 seeks @ 3 ms = 4000 * 3/1000 = 12 seconds
[ NOTE: this is seek time only, not latency or transfer ]

Regular K-Way Merge:

: 2000 runs; room for 5 records from each run in RAM
= 1/2000 of each run in memory at a time
= 2000 seeks to read in all of one run; there are 2000
of them so 2000 * 2000 = 4,000,000 seeks
: 4,000,000 * 3 ms = 12,000 seconds = 200 minutes (> 3 hr.)
3 Hours, 20 Minutes to do the merge [seek time only]

2-Step Polyphase Merge:

: Break it up into 50 sets of 40 runs each
: To merge 40 runs:

- 1/40th of RAM for each run = 500,000 Bytes
- room for 500,000 Bytes / 2000 Bytes/Record = 250 records from each run
- 1 run has 10,000 records so 250/10,000 = 1/40th of a run
- can read 1/40th of a run with 1 seek so 40 seeks to read one run
- there are 40 of them so 40 * 40 = 1600 seeks to do 1 set
- there are 50 sets to do so 50 * 1600 = 80,000 seeks

80,000 seeks for phase 1
: To merge 50 runs (each 400,000 Records long - 40*10,000)

- 1/50th of RAM for each run = 400,000 Bytes
- room for 400,000 Bytes / 2000 Bytes/Record = 200 records from each run
- 1 run has 400,000 records so 200/400,000 = 1/2000th of a run
- can read 1/2000th of a run with 1 seek; 2000 seeks to read 1 run
- there are 50 of them so 50 * 2000 = 100,000 seeks for phase 2
100,000 seeks for phase 2

180,000 seeks * 3ms = 540 sec. = 9 minutes
9 Minutes to do the merge [cool]

What about Polyphase Merge? How to calculate seeks for this?

Let's start simple....

say, 17 runs 3rd order merge; 1,000 records/run; 1,000 bytes/record; 1 MB RAM

Amount in square brackets indicates relative size of run; [1] = 1,000 records

Phases:	1	2	3	4
0	7[1]	6[1]	4[1]	0	initial distribution
1	3[1]	2[1]	0	4[3]
2	1[1]	0	2[5]	2[3]
3	0	1[9]	1[5]	1[3]
4	1[17]	0	0	0

Phase 0: Initial Distribution which can be done when runs are first created.

Phase 1: Merge 4 sets of 3 runs

Phase 2: Merge 2 sets of 3 runs this way:

: same as above but we 2000 records more in one of the runs
(2 "run lengths") which means 1 run will require more seeks to merge than the other 2

Phase 3: Merge 1 set of 3 runs this way:

: merge 1 "run length" 3 ways, then
: merge 2 "run lengths" 2 ways, then
: merge 1 "run length" 1 way
( this does not happen in chronological order but amounts to the same number of seeks for each run in the end)

Phase 4: Merge 1 set of 3 runs this way:

: merge 3 "run lengths" 3 ways, then
: merge 2 "run lengths" 2 ways, then
: merge 4 "run lengths" 1 way

Now, to count seeks....

Phase 0: we don't count seeks here, they are counted when we create the runs

Phase 1: 4 sets of 3 runs:

: 1/3 of RAM for each = 333,333 Bytes = 333 Records
: 1/3 of each run = 3 seeks/run * 3 runs = 9 seeks
: 4 sets to do = 9 * 4 = 36 seeks [36 SEEKS]

Phase 2: 2 sets of 3 runs (1 run in each set has 3,000 records in it)

: 1/3 RAM = 333 records
: 3 seeks/run * 3 runs = 9 seeks, but last file has 2,000 records left, so 6 more seeks for it = 15 seeks per set
: 2 sets = 2 * 15 = 30 seeks [30 SEEKS]

Phase 3: 1 set of 3 runs (1 has 1000 rec., 1 has 3000 rec., 1 has 5000 rec.)

: 3 seeks/run * 3 runs = 9 seeks, but 2 runs have at least 2000 records left, so 6 more seeks for each; 6 * 2 = 12 MORE seeks, but the last run still has 2000 records left so 6 MORE for it
: 9 seeks + 12 seeks + 6 seeks = 27 seeks [27 SEEKS]

Phase 4: 1 set of 3 runs (lengths 9000, 5000, 3000 records)

: smallest size is 3000 rec. so 1/3 RAM = 9 seeks/run so 9 seeks/run * 3 runs = 27 seeks
: now 2 runs still have at least 2000 records left so 6 seeks for each * 2 of them = 12 seeks
: now last run has 4000 records left so 3*4 = 12 seeks
: 27 seeks + 12 seeks + 12 seeks = 53 seeks [53 SEEKS]

36 + 30 + 27 + 53 = 146 SEEKS TOTAL

Compare with 17-Way Merge:

1/17th of RAM for each run =~ 58,823 Bytes = 58 records from each run = 1/17th of a run
: 17 seeks for each run * 17 runs = 289 seeks

BUT!!!!

The big advantage of the Polyphase Merge is that we are using >1 device so we

can read simultaneously from each device, so.....

(if 3 devices serve as sources for runs we can do 3 seeks at the same time)

Phase 1: 4 sets of 3 runs

: 1/3 RAM for each = 333 records
: 3 seeks/ run (read concurrently) = 3 seeks
: 4 sets to do so 4 * 3 = 12 seeks [12 SEEKS]

Phase 2: 2 sets of 3 runs (1 has 3000 records in it)

: still 333 records from each run with 1 seek
: 3 seeks/run (read concurrently) = 3 seeks BUT last run has 2000 records left so 6 more seeks
: = 9 seeks [9 SEEKS]

Phase 3: 1 set of 3 runs (1000, 3000, 5000 records respectively)

: 3 seeks/run = 3 seeks, but 2 runs have at least 2000 records left so 6 more for each (read concurrently) = 6 MORE seeks, but last run still has 2000 records left so 6 more for it
: 3 seeks + 6 seeks + 6 seeks = 15 seeks [15 SEEKS]

Phase 4: 1 set of 3 runs, lengths 9000, 5000, 3000

: smallest size is 3000, so 1/3 RAM = 9 seeks/run (read concurrently)
: now we still have at least 2000 records left in 2 runs, so 6 seeks for each (read concurrently)
: AND last run still has 4000 records left so 4 * 3 = 12 seeks
: 9 seeks + 6 seeks + 12 seeks = 27 seeks [27 seeks]

12 + 9 + 15 + 27 = 63 SEEKS for Polyphase Merge

Compare with Cascade:

3rd order table:

0	1	2	3	total
1	1	1	1	3
2	3	2	1	6
3	6	5	3	14
4	14	11	6	31

There is no 'perfect' distribution, so save one set of 3 and merge the other 14 runs

Initial Distribution:

-	1	2	3	4
0	6[1]	5[1]	3[1]	0	{merge 3 sets of 3 concurrently = 9 seeks}
1a	3[1]	2[1]	0	3[3]	{merge 2 sets of 2 concurrently from device 1 & 2; leave 4 alone = 4 seeks}
1b	1[1]	0	2[2]	3[3]	{merge 1 set of 1000,2000,3000 = 9 seeks}
2a	0	1[6]	1[2]	2[3]	{merge 1 set of 2000,3000 from device 3 & 4; leave 2 alone = 9 seeks}
2b	1[5]	1[6]	0	1[3]	{now merge 1 set of 3000,5000,6000 = 18 seeks}
3	0	0	1[14]	0	Merge the 3 we were holding: 3 seeks = 1[3]

Merge 1[3] with 1[14] = 42 seeks (takes as many seeks as the longest run)

9 seeks + 4 seeks + 9 seeks + 18 seeks + 3 seeks + 42 seeks = 85 seeks

In this case this arrangement is worse than Polyphase merge but still better than 17-Way Merge.

There may be a better way to arrange the initial distribution (maybe with replacement selection heapsort to build the runs we can end up with 14 slightly longer runs).

In general, the decision about which Merge algorithm is best will depend on at LEAST the following:

1. number of devices available
2. amount of RAM at your disposal
3. number of initial runs
4. the blocking factor (how many records can you be guaranteed to get with one seek). Remember, you don't ever want to read part of a record and you can never read part of a block to save seeks.