Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-13205

Checkpoints/savepoints injection has loose ordering properties when a stop-with-savepoint is triggered

    XMLWordPrintableJSON

Details

    Description

      When a stop-with-savepoint is triggered at a source task, the task's dispatcher (Task.asyncCallDispatcher)'s thread pool is extended (from single-threaded, it becomes multi-threaded).

      This leads to a race of applying consequent checkpoints/savepoints from dispatcher's queue at the same time and checkpoints/savepoints would be not strictly ordered in the event stream.

      As the result, checkpoints/savepoints that injected later than they should, may be "silently subsumed": potentially, they would be ignored and won't be reported to checkpoint coordinator.

      Proposed solution:

      Revert Task.asyncCallDispatcher behavior to be single-threaded.
      For stop-with-savepoint feature, the dispatcher's thread that performs the synchronous savepoint doesn't need to be blocking and StreamTask.finishTask() invocation can be delegated to StreamTask.notifyCheckpointComplete().

      Note: imo, the issue described here is not critical, but the proposed change should simplify implementation. This ticket can be considered as enhancement.

      Attachments

        Issue Links

          Activity

            People

              1u0 Alex
              1u0 Alex
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m