Improve Run Merging: Reduce Number of Merge Passes
Improve Run Merging: Reduce Number of Merge Passes
More generally, a higher-order merge reduces the cost of the optimal merge tree.
DISK
MEMORY
DISK
Write to disk
Merge
Partitioning Of Memory
O0 O1
DISK
I0
I1
DISK Ib
Buffer Allocation
When ready to read a buffer load, determine which run will exhaust first.
Examine key of the last record read from each of the k runs. Run with smallest last key read will exhaust first. Use an enforceable tie breaker.
Next buffer load of input is to come from run that will exhaust first, allocate an input buffer to this run.
Buffer Layout
Output buffers F0 F1 F2 F3 F4 F5 F6 F7 F8
R0
R1
R2
R3
R4
R5
R6
R7
R8
Merge k Runs
repeat
kWayMerge; wait for input/output to complete; add new input buffer (if any) to queue for its run; determine run that will exhaust first; if (there is more input from this run) initiate read of next block for this run; initiate write of active output buffer; activeOutputBuffer = 1 activeOutputBuffer;
If merge hasnt stopped and an input buffer gets empty, advance to next buffer in queue and free empty buffer. There may be no next buffer in the queue.
If this type of failure were to happen, using two different and valid analyses, we will end up with inconsistent counts of the amount of data available to kWayMerge. Data available to kWayMerge is data in
Input buffer queues. Active output buffer. Excludes data in buffer being read or written.
If merge hasnt stopped and an input buffer gets empty, advance to next buffer in queue and free empty buffer. There may be no next buffer in the queue.
Alternative analysis of data available to kWayMerge at time of failure. < 1 buffer load in active output buffer <= k 1 buffer loads in remaining k 1 queues Total data available to k-way merge is < k buffer loads.
Suppose there is no free input buffer. One analysis will show there are exactly k + 1 buffer loads in memory (including newly read input buffer) at time of failure. Another analysis will show there are > k + 1 buffer loads in memory at time of failure. Note that at time of failure there is no buffer being read or written.