Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-2535

Allow explicit output time independent of firing specification for all timers

Details

    Description

      Today, we have insufficient control over the event time timestamp of elements output from a timer callback.

      1. For an event time timer, it is the timestamp of the timer itself.
      2. For a processing time timer, it is the current input watermark at the time of processing.

      But for both of these, we may want to reserve the right to output a particular time, aka set a "watermark hold".

      A naive implementation of a TimerWithWatermarkHold would work for making sure output is not droppable, but does not fully explain window expiration and late data/timer dropping.

      In the natural interpretation of a timer as a feedback loop on a transform, timers should be viewed as another channel of input, with a watermark, and items on that channel all need event time timestamps even if they are delivered according to a different time domain.

      I propose that the specification for when a timer should fire should be separated (with nice defaults) from the specification of the event time of resulting outputs. These timestamps will determine a side channel with a new "timer watermark" that constrains the output watermark.

      • We still need to fire event time timers according to the input watermark, so that event time timers fire.
      • Late data dropping and window expiration will be in terms of the minimum of the input watermark and the timer watermark. In this way, whenever a timer is set, the window is not going to be garbage collected.
      • We will need to make sure we have a way to "wake up" a window once it is expired; this may be as simple as exhausting the timer channel as soon as the input watermark indicates expiration of a window

      This is mostly aimed at end-user timers in a stateful+timely DoFn. It seems reasonable to use timers as an implementation detail (e.g. in runners-core utilities) without wanting any of this additional machinery. For example, if there is no possibility of output from the timer callback.

      Attachments

        Issue Links

          Activity

            People

              reuvenlax Reuven Lax
              kenn Kenneth Knowles
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 25h 10m
                  25h 10m