Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Implemented
-
1.0.0
-
None
Description
DataSet.partitionByRange() does not allow to specify the sort order of fields. This is fine if range partitioning is used to reduce skewed partitioning.
However, it is not sufficient if range partitioning is used to sort a data set in parallel.
Since DataSet.partitionByRange() is @Public API and cannot be easily changed, I propose to add a method withOrders(Order... orders) to PartitionOperator. The method should throw an exception if the partitioning method of PartitionOperator is not range partitioning.