Details
-
Sub-task
-
Status: Closed
-
Minor
-
Resolution: Won't Do
-
None
-
None
-
None
Description
This issue is similar to FLINK-3477 but adds a hash-based strategy for CombineFunction instead of ReduceFunction.
The interface of CombineFunction differs from ReduceFunction by providing an Iterable<T> instead of two T values. Hence, if the Iterable<T> provides two values, we can do the same as with a ReduceFunction.
At the moment, CombineFunction is wrapped in a GroupCombineFunction and hence executed using the GroupReduceCombineDriver.
We should add dedicated two dedicated drivers: CombineDriver and ChainedCombineDriver and two driver strategies: HASH_COMBINE and SORT_COMBINE.
If FLINK-3477 is resolved, we can reuse the hash-table.
We should also add compiler hints to `DataSet.reduceGroup()` and `Grouping.reduceGroup()` to allow users to select between a SORT and HASH based combine strategies (HASH will only be applicable to CombineFunction and not GroupCombineFunction).