Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Situation:
Assuming daily partitioning.
Node A manages the data of Day1,3,5 and Node B manages the data of Day2,4. In the current implementation, when the coordinator node fetches a batch from Node A, the batch may contain data of Day1,3 and the batch from Node B contains data of Day2. As a result, the coordinator node must merge the two batches to retain an ordered batch.
But if the batches never cross the partition border, the coordinator node will be able to just return the batches without merging using a heap comparing the first element of each batch, which could reduce the merging overheads.