[BEAM-9946] Enhance Partition transform to provide partitionfn with SideInputs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: P2
Resolution: Fixed
Affects Version/s: None
Fix Version/s: Not applicable
Component/s: sdk-java-core
Labels:
- Done

Description

Currently Partition transform can partition a collection into n collections based on only element value in PartitionFn to decide on which partition a particular element belongs to.

public interface PartitionFn<T> extends Serializable {
    int partitionFor(T elem, int numPartitions);
  }
public static <T> Partition<T> of(int numPartitions, PartitionFn<? super T> partitionFn) {
    return new Partition<>(new PartitionDoFn<T>(numPartitions, partitionFn));
  }

It will be useful to introduce new API with additional sideInputs provided to partition function. User will be able to write logic to use both element value and sideInputs to decide on which partition a particular element belongs to.

Option-1: Proposed new API:

  public interface PartitionWithSideInputsFn<T> extends Serializable {
    int partitionFor(T elem, int numPartitions, Context c);
  }
public static <T> Partition<T> of(int numPartitions, PartitionWithSideInputsFn<? super T> partitionFn, Requirements requirements) {
 ...
  }

User can use any of the two APIs as per there partitioning function logic.

Option-2: Redesign old API with Builder Pattern which can provide optionally a Requirements with sideInputs. Deprecate old API.

// using sideviews
Partition.into(numberOfPartitions).via(
fn(
  (input,c) ->  {
    // use c.sideInput(view)
    // use input
    // return partitionnumber
 },requiresSideInputs(view))
)
// without using sideviews
Partition.into(numberOfPartitions).via(
fn((input,c) ->  {
    // use input
    // return partitionnumber
 })
)

Attachments

Issue Links

links to

GitHub Pull Request #11682

Activity

People

Assignee:: Darshan Jani

Reporter:: Darshan Jani

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 11/May/20 01:38

Updated:: 06/Oct/20 21:55

Resolved:: 12/Jun/20 02:48

Time Tracking

Estimated:

96h

Remaining:

91h 40m

Logged:

4h 20m