Description
The groupByKey invocation in the Distinct class currently uses the default (recommended) number of reducers without providing an option to override this:
public static <S> PCollection<S> distinct(PCollection<S> input, int flushEvery) { Preconditions.checkArgument(flushEvery > 0); PType<S> pt = input.getPType(); PTypeFamily ptf = pt.getFamily(); return input .parallelDo("pre-distinct", new PreDistinctFn<S>(flushEvery, pt), ptf.tableOf(pt, ptf.nulls())) .groupByKey() .parallelDo("post-distinct", new PostDistinctFn<S>(), pt); }
Would it be possible to enhance this method such that it is possible to customize the number of reducers? Either explicitly or via a GroupingOptions object.