Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-9161

Inconsistent view of current ActiveBundle from main and bundle timer thread

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • 2.20.0
    • runner-flink
    • None

    Description

      In the DoFnOperator and the subclassed ExecutableStageDoFnOperator there are
      effectively two processing threads:

      (1) The main processing thread for processing elements and watermarks
      (2) A timer thread to support ending bundles after a timeout to optimize for
      latency

      Although the code was written with a different assumption in the past, the two
      threads do not interleave with each other. Only one is active at a time. This is
      ensured by Flink's "checkpointLock" which is acquired for every method called on
      the operator like processElement, processWatermark, snapshotState. It is also
      acquired when timers are fired which are set using Flink's TimeService, like it
      is the case for (2).

      We've seen issues with bundles being closed multiple times resulting in
      exceptions like "Already closed". Very rarely, we've also seen dead bundle ids
      resurrecting for which no other explanation could be found than an inconsistent
      view of the two thread.

      Attachments

        Issue Links

          Activity

            People

              mxm Maximilian Michels
              mxm Maximilian Michels
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m