Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2491

Support Checkpoints After Tasks Finished

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Implemented
    • Affects Version/s: 0.10.0
    • Fix Version/s: 1.14.0
    • Labels:
      None

      Description

      While implementing a test case for the Kafka Consumer, I came across the following bug:

      Consider the following topology, with the operator parallelism in parentheses:

      Source (2) --> Sink (1).

      In this setup, the snapshotState() method is called on the source, but not on the Sink.
      The sink receives the generated data.
      only one of the two sources is generating data.

      I've implemented a test case for this, you can find it here: https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java

        Attachments

          Issue Links

          1.
          Refactor the CheckpointCoordinator to compute the tasks to trigger/wait/commit dynamically Sub-task Closed Yun Gao
          2.
          Modify the logic of computing tasks to trigger/wait/commit to consider finished tasks Sub-task Closed Yun Gao
          3.
          Stores finished status for fully finished operators Sub-task Closed Yun Gao
          4.
          Decline Checkpoint if some tasks finished before get triggered Sub-task Closed Dawid Wysakowicz
          5.
          Allows tasks to report finish state Sub-task Closed Yun Gao
          6.
          Handle UnionListState with finished operators Sub-task Closed Yun Gao
          7.
          Handle BroadcastState with finished operators Sub-task Closed Yun Gao
          8.
          Refactor StreamTask hierarchy to support triggering checkpoint via RPC for non-source tasks Sub-task Closed Yun Gao
          9.
          Support checkpoint alignment with finished input channels Sub-task Closed Yun Gao
          10.
          StreamTask waits for all the asynchronous step of pending checkpoint to finish Sub-task Closed Yun Gao
          11.
          CheckpointCoordinator pass the flag about whether a operator is fully finished on recovery Sub-task Closed Yun Gao
          12.
          Skip the execution of the fully finished operators after recovery Sub-task Closed Yun Gao
          13.
          Re-compute tasks to trigger when tasks get triggered before finished Sub-task Closed Unassigned
          14.
          Check for illegal modifications of JobGraph with finished operators Sub-task Closed Dawid Wysakowicz
          15.
          Check for illegal modifications of JobGraph with partially finished operators Sub-task Closed Yun Gao
          16.
          Deprecate/Remove StreamOperator#dispose method Sub-task Closed Dawid Wysakowicz
          17.
          Add finish method to the SinkFunction Sub-task Closed Dawid Wysakowicz
          18.
          Call StreamOperator#finish on EndOfData Sub-task Closed Dawid Wysakowicz
          19.
          Wait for a checkpoint completed after finishing a task Sub-task Closed Dawid Wysakowicz
          20.
          Add a global flag for enabling/disabling final checkpoints Sub-task Closed Dawid Wysakowicz
          21.
          Do not create transaction in TwoPhaseCommitSinkFunction after finish() Sub-task Closed Yuan Mei
          22.
          Trigger global failover for synchronous savepoints on CheckpointCoordinator Sub-task Closed Unassigned
          23.
          Pass watermarks in finishedOnRestore tasks Sub-task Closed Piotr Nowojski
          24.
          When task is finishedOnRestore Operators shouldn't be used Sub-task Closed Piotr Nowojski
          25.
          notifyCheckpointComplete without corresponding snapshotState Sub-task Closed Yun Gao
          26.
          FLIP-147 Checkpoint N has not been started at all Sub-task Closed Yun Gao
          27.
          Bypass operators when advanceToEndOfEventTime for both legacy and new source tasks Sub-task Closed Yun Gao
          28.
          FLIP-147: Unable to recover after source fully finished Sub-task Closed Yun Gao
          29.
          Take snapshots for operator coordinators which all corresponding tasks finished Sub-task Closed Yun Gao
          30.
          Fix the concurrent problem between triggering savepoint with drain and finish Sub-task Closed Yun Gao
          31.
          Double check if non-keyed FullyFinishedOperatorState can be mixed up with non finished OperatorState on recovery Sub-task Closed Unassigned
          32.
          Do not resume channels if the barrier is received via RPC Sub-task Closed Yun Gao
          33.
          Respect the finished flag when extracting operator states due to skip in-flight data Sub-task Closed Yun Gao
          34.
          Upgrade the TwoPhaseCommitSink to support empty transaction after finished Sub-task Closed Yun Gao
          35.
          CheckpointBarrierHandler may skip the markAlignmentStart for alignment -with-timeout checkpoint Sub-task Closed Yun Gao
          36.
          Verify the final checkpoint and stop-with-savepoint --drain Sub-task Closed Yangze Guo
          37.
          Add E2E/ITCase test for checkpoints after tasks finished Sub-task Closed Roman Khachatryan
          38.
          Document FLIP-147 capabiliites and limitations Sub-task Closed Dawid Wysakowicz
          39.
          Log improvement for aborting checkpoint due to tasks are finishing Sub-task Closed Yun Gao

            Activity

              People

              • Assignee:
                gaoyunhaii Yun Gao
                Reporter:
                rmetzger Robert Metzger
              • Votes:
                17 Vote for this issue
                Watchers:
                61 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: