Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-10056

Side Input Validation too tight, doesn't allow CoGBK

Details

    Description

      The following doesn't pass validation, though it should as it's a valid signature for ParDo accepting a PCollection<CoGBK<string, *clientHistory, *clientHistory>>

      func (fn *writer) StartBundle(ctx context.Context) error

      func (fn *writer) ProcessElement(
      ctx context.Context,
      key string,
      iter1, iter2 func(**clientHistory) bool)

      func (fn *writer) FinishBundle(ctx context.Context)

      It returns an error:

      Missing side inputs in the StartBundle method of a DoFn. If side inputs are present in ProcessElement those side inputs must also be present in StartBundle.
      Full error:
      inserting ParDo in scope root:
      graph.AsDoFn: for Fn named <...pii...>/userpackage.writer:
      side inputs expected in method StartBundle [recovered]
      panic: Missing side inputs in the StartBundle method of a DoFn. If side inputs are present in ProcessElement those side inputs must also be present in StartBundle.
      Full error:
      inserting ParDo in scope root:
      graph.AsDoFn: for Fn named <...pii...>/userpackage.writer:
      side inputs expected in method StartBundle

      This is happening in the input unaware validation, which means it needs to be loosened, and validated elsewhere.
      https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/fn.go#L527

      There are "sibling" cases for the DoFn signature

      func (fn writer) StartBundle(context.Context, side func(*clientHistory) bool) error

      func (fn *writer) ProcessElement(
      ctx context.Context,
      key string,
      iter, side func(**clientHistory) bool)

      func (fn writer) FinishBundle( context.Context, side, func(*clientHistory) bool)

      and

      func (fn writer) StartBundle(context.Context, side1, side2 func(*clientHistory) bool) error

      func (fn *writer) ProcessElement(
      ctx context.Context,
      key string,
      side1, side2 func(**clientHistory) bool)

      func (fn writer) FinishBundle( context.Context, side1, side2 func(*clientHistory) bool)

      Would be for <CoGBK<string, *clientHistory>> with <*clientHistory> on the side, and
      <string,> with <*clientHistory> and <*clientHistory> on the side respectively.

      Which would only be determinable fully with the input, and should provide a clear error when PCollection binding is occuring.

      Attachments

        Issue Links

          Activity

            People

              lostluck Robert Burke
              lostluck Robert Burke
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m