Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-9946

Enhance Partition transform to provide partitionfn with SideInputs

Details

    • New Feature
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • Not applicable
    • sdk-java-core

    Description

      Currently Partition transform can partition a collection into n collections based on only element value in PartitionFn to decide on which partition a particular element belongs to.

      public interface PartitionFn<T> extends Serializable {
          int partitionFor(T elem, int numPartitions);
        }
      public static <T> Partition<T> of(int numPartitions, PartitionFn<? super T> partitionFn) {
          return new Partition<>(new PartitionDoFn<T>(numPartitions, partitionFn));
        }
      

      It will be useful to introduce new API with additional sideInputs provided to partition function. User will be able to write logic to use both element value and sideInputs to decide on which partition a particular element belongs to.

      Option-1: Proposed new API:

        public interface PartitionWithSideInputsFn<T> extends Serializable {
          int partitionFor(T elem, int numPartitions, Context c);
        }
      public static <T> Partition<T> of(int numPartitions, PartitionWithSideInputsFn<? super T> partitionFn, Requirements requirements) {
       ...
        }
      

      User can use any of the two APIs as per there partitioning function logic.

      Option-2: Redesign old API with Builder Pattern which can provide optionally a Requirements with sideInputs. Deprecate old API.

      // using sideviews
      Partition.into(numberOfPartitions).via(
      fn(
        (input,c) ->  {
          // use c.sideInput(view)
          // use input
          // return partitionnumber
       },requiresSideInputs(view))
      )
      // without using sideviews
      Partition.into(numberOfPartitions).via(
      fn((input,c) ->  {
          // use input
          // return partitionnumber
       })
      )
      

       

      Attachments

        Activity

          People

            darshanjani Darshan Jani
            darshanjani Darshan Jani
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 96h
                96h
                Remaining:
                Time Spent - 4h 20m Remaining Estimate - 91h 40m
                91h 40m
                Logged:
                Time Spent - 4h 20m Remaining Estimate - 91h 40m
                4h 20m