Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-642

Enable numReducers option for methods in Distinct

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • 0.14.0
    • 1.0.0
    • Core
    • None
    • Patch

    Description

      The groupByKey invocation in the Distinct class currently uses the default (recommended) number of reducers without providing an option to override this:

      public static <S> PCollection<S> distinct(PCollection<S> input, int flushEvery) {
        Preconditions.checkArgument(flushEvery > 0);
        PType<S> pt = input.getPType();
        PTypeFamily ptf = pt.getFamily();
        return input
            .parallelDo("pre-distinct", new PreDistinctFn<S>(flushEvery, pt), ptf.tableOf(pt, ptf.nulls()))
            .groupByKey()
            .parallelDo("post-distinct", new PostDistinctFn<S>(), pt);
      }
      

      Would it be possible to enhance this method such that it is possible to customize the number of reducers? Either explicitly or via a GroupingOptions object.

      Attachments

        Activity

          People

            jwills Josh Wills
            xaviert Xavier
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: