Uploaded image for project: 'Crunch'
  1. Crunch
  2. CRUNCH-637

crunch.bytes.per.reduce.task cannot be used with GroupingOptions

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.14.0
    • Fix Version/s: None
    • Component/s: Core
    • Labels:
      None

      Description

      I had expect to be able to use crunch.bytes.per.reduce.task in GroupingOptions to fine tune job parallelism.

           .groupByKey(
                              GroupingOptions.builder()
                                      .conf(PartitionUtils.BYTES_PER_REDUCE_TASK, Long.toString(50_000_000))
                                      .partitionerClass(RoundRobinPartitioner.class)
                                      .build())
      

      However, PGroupedTableImpl does not care about GroupingOptions.extraConf and gets crunch.bytes.per.reduce.task from pipeline configuration.

      public class PGroupedTableImpl<K, V> extends BaseGroupedTable<K, V> implements MRCollection {
      
          public void configureShuffle(Job job) {
              this.ptype.configureShuffle(job, this.groupingOptions);
              if(this.groupingOptions == null || this.groupingOptions.getNumReducers() <= 0) {
                  int numReduceTasks = PartitionUtils.getRecommendedPartitions(this, this.getPipeline().getConfiguration());
                  if(numReduceTasks > 0) {
                      // [...] 
      

      Is there any reason to not give GroupingOptions.extraConf a chance ?

        Attachments

          Activity

            People

            • Assignee:
              jwills Josh Wills
              Reporter:
              clement@unportant.info Clément MATHIEU
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: