Type: New Feature
Affects Version/s: None
Fix Version/s: None
Create a new operator that caches a number of record batches and then coordinates across the cluster on the distribution of partitioning keys to try to determine a reasonable set of range partitions. The outgoing stream should include a partition key that is equal to the width of the receiving fragment.
- histogram or similar should be held in the distributed cache
- need to figure out the logic for how long to wait before the partitioning estimate is good enough.
- need to update the partitioning sender so that we can drop the partitioning column rather than sending it onward.