Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.14.0
-
None
-
None
Description
I had expect to be able to use crunch.bytes.per.reduce.task in GroupingOptions to fine tune job parallelism.
.groupByKey(
GroupingOptions.builder()
.conf(PartitionUtils.BYTES_PER_REDUCE_TASK, Long.toString(50_000_000))
.partitionerClass(RoundRobinPartitioner.class)
.build())
However, PGroupedTableImpl does not care about GroupingOptions.extraConf and gets crunch.bytes.per.reduce.task from pipeline configuration.
public class PGroupedTableImpl<K, V> extends BaseGroupedTable<K, V> implements MRCollection { public void configureShuffle(Job job) { this.ptype.configureShuffle(job, this.groupingOptions); if(this.groupingOptions == null || this.groupingOptions.getNumReducers() <= 0) { int numReduceTasks = PartitionUtils.getRecommendedPartitions(this, this.getPipeline().getConfiguration()); if(numReduceTasks > 0) { // [...]
Is there any reason to not give GroupingOptions.extraConf a chance ?