Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
0.10.1
-
None
Description
The optimizer does not inject a combiner if the input of a Reducer or GroupReducer is explicitly partitioned as in the following example
DataSet<Tuple2<String,Integer>> words = ... DataSet<Tuple2<String,Integer>> counts = words .partitionByHash(0) .groupBy(0) .sum(1);
Explicit partitioning can be useful to enforce partitioning on a subset of keys or to use a different partitioning method (custom or range partitioning).
This issue should be fixed by changing the instantiate() methods of the ReduceProperties and GroupReduceWithCombineProperties classes such that a combine is injected in front of a PartitionPlanNode if it is the input of a Reduce or GroupReduce operator. This should only happen, if the Reducer is the only successor of the Partition operator.