Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Create a new operator that caches a number of record batches and then coordinates across the cluster on the distribution of partitioning keys to try to determine a reasonable set of range partitions. The outgoing stream should include a partition key that is equal to the width of the receiving fragment.
- histogram or similar should be held in the distributed cache
- need to figure out the logic for how long to wait before the partitioning estimate is good enough.
- need to update the partitioning sender so that we can drop the partitioning column rather than sending it onward.