Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-10302

Respect timestamp OutputTime Windowing Strategy configuration in Lifted CombineFns.

Details

    • New Feature
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • sdk-go
    • None

    Description

      The Go SDK currently retains an arbitrary timestamp per key per bundle when performing a lifted combine.
      However, depending on the windowing strategy, a prefered time could be specified.
      https://github.com/apache/beam/blob/a5b2046b10bebc59c5bde41d4cb6498058fdada2/model/pipeline/src/main/proto/beam_runner_api.proto#L901

      The code in question for the Go SDK:
      https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/combine.go#L395

      At present this implementation is "correct", as the default output time is Unspecified, and there's no user mechanism to configure a windowing strategy to this granularity.

      So there are a few parts to this.
      1. Propagate the windowing strategy information to exec.LiftedCombine somehow and implement the correct output. This can be done whether or not 2 is implemented.
      2. Provide a trigger configuration for beam.WindowInto, so this can be configured on the user side. This is significantly more work.

      This matters only when using windows that are not the Global Window, and when using a Lifted Combine, which commonly only happens in batch contexts. However, since Beam is a unified model, the windowing features should work correctly in both execution modes of a Go SDK pipeline.

      Attachments

        Activity

          People

            Unassigned Unassigned
            lostluck Robert Burke
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: