Uploaded image for project: 'Crunch'
  1. Crunch
  2. CRUNCH-642

Enable numReducers option for methods in Distinct

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Trivial
    • Resolution: Fixed
    • Affects Version/s: 0.14.0
    • Fix Version/s: 1.0.0
    • Component/s: Core
    • Labels:
      None
    • Flags:
      Patch

      Description

      The groupByKey invocation in the Distinct class currently uses the default (recommended) number of reducers without providing an option to override this:

      public static <S> PCollection<S> distinct(PCollection<S> input, int flushEvery) {
        Preconditions.checkArgument(flushEvery > 0);
        PType<S> pt = input.getPType();
        PTypeFamily ptf = pt.getFamily();
        return input
            .parallelDo("pre-distinct", new PreDistinctFn<S>(flushEvery, pt), ptf.tableOf(pt, ptf.nulls()))
            .groupByKey()
            .parallelDo("post-distinct", new PostDistinctFn<S>(), pt);
      }
      

      Would it be possible to enhance this method such that it is possible to customize the number of reducers? Either explicitly or via a GroupingOptions object.

        Attachments

          Activity

            People

            • Assignee:
              jwills Josh Wills
              Reporter:
              xaviert Xavier
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: