Details
-
New Feature
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
None
Description
Currently Partition transform can partition a collection into n collections based on only element value in PartitionFn to decide on which partition a particular element belongs to.
public interface PartitionFn<T> extends Serializable { int partitionFor(T elem, int numPartitions); } public static <T> Partition<T> of(int numPartitions, PartitionFn<? super T> partitionFn) { return new Partition<>(new PartitionDoFn<T>(numPartitions, partitionFn)); }
It will be useful to introduce new API with additional sideInputs provided to partition function. User will be able to write logic to use both element value and sideInputs to decide on which partition a particular element belongs to.
Option-1: Proposed new API:
public interface PartitionWithSideInputsFn<T> extends Serializable { int partitionFor(T elem, int numPartitions, Context c); } public static <T> Partition<T> of(int numPartitions, PartitionWithSideInputsFn<? super T> partitionFn, Requirements requirements) { ... }
User can use any of the two APIs as per there partitioning function logic.
Option-2: Redesign old API with Builder Pattern which can provide optionally a Requirements with sideInputs. Deprecate old API.
// using sideviews Partition.into(numberOfPartitions).via( fn( (input,c) -> { // use c.sideInput(view) // use input // return partitionnumber },requiresSideInputs(view)) ) // without using sideviews Partition.into(numberOfPartitions).via( fn((input,c) -> { // use input // return partitionnumber }) )