Resolution: Won't Do
Affects Version/s: None
Fix Version/s: None
Component/s: Runtime / Task
This issue is similar to
FLINK-3477 but adds a hash-based strategy for CombineFunction instead of ReduceFunction.
The interface of CombineFunction differs from ReduceFunction by providing an Iterable<T> instead of two T values. Hence, if the Iterable<T> provides two values, we can do the same as with a ReduceFunction.
At the moment, CombineFunction is wrapped in a GroupCombineFunction and hence executed using the GroupReduceCombineDriver.
We should add dedicated two dedicated drivers: CombineDriver and ChainedCombineDriver and two driver strategies: HASH_COMBINE and SORT_COMBINE.
FLINK-3477 is resolved, we can reuse the hash-table.
We should also add compiler hints to `DataSet.reduceGroup()` and `Grouping.reduceGroup()` to allow users to select between a SORT and HASH based combine strategies (HASH will only be applicable to CombineFunction and not GroupCombineFunction).