Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-10676

Timers use the input timestamp as the timer output timestamp which prevents watermark progress

Details

    • Bug
    • Status: Triage Needed
    • P2
    • Resolution: Fixed
    • None
    • 2.24.0
    • sdk-py-core, sdk-py-harness
    • None

    Description

      By default, the Python SDK adds a timer output timestamp equal to the current timestamp of an element. This is problematic because

      1. We hold back the output watermark on the current element's timestamp for every timer
      2. It doesn't match the behavior in the Java SDK which defaults to using the fire timestamp as the timer output timestamp (and adds a hold on it)
      3. There is no way for the user to influence this behavior because there is no user-facing API

      https://github.com/apache/beam/blob/dfadde2d3ee0a0487362dbcca80388fdc2ef2302/sdks/python/apache_beam/runners/worker/bundle_processor.py#L650

      We should use the fire timestamp as the default output timestamp.

      Attachments

        Issue Links

          Activity

            People

              mxm Maximilian Michels
              mxm Maximilian Michels
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m