Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-637

crunch.bytes.per.reduce.task cannot be used with GroupingOptions

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.14.0
    • None
    • Core
    • None

    Description

      I had expect to be able to use crunch.bytes.per.reduce.task in GroupingOptions to fine tune job parallelism.

           .groupByKey(
                              GroupingOptions.builder()
                                      .conf(PartitionUtils.BYTES_PER_REDUCE_TASK, Long.toString(50_000_000))
                                      .partitionerClass(RoundRobinPartitioner.class)
                                      .build())
      

      However, PGroupedTableImpl does not care about GroupingOptions.extraConf and gets crunch.bytes.per.reduce.task from pipeline configuration.

      public class PGroupedTableImpl<K, V> extends BaseGroupedTable<K, V> implements MRCollection {
      
          public void configureShuffle(Job job) {
              this.ptype.configureShuffle(job, this.groupingOptions);
              if(this.groupingOptions == null || this.groupingOptions.getNumReducers() <= 0) {
                  int numReduceTasks = PartitionUtils.getRecommendedPartitions(this, this.getPipeline().getConfiguration());
                  if(numReduceTasks > 0) {
                      // [...] 
      

      Is there any reason to not give GroupingOptions.extraConf a chance ?

      Attachments

        Activity

          People

            jwills Josh Wills
            clement@unportant.info Clément MATHIEU
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: