Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-2491

Support Checkpoints After Tasks Finished

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Implemented
    • 0.10.0
    • 1.14.0
    • None

    Description

      While implementing a test case for the Kafka Consumer, I came across the following bug:

      Consider the following topology, with the operator parallelism in parentheses:

      Source (2) --> Sink (1).

      In this setup, the snapshotState() method is called on the source, but not on the Sink.
      The sink receives the generated data.
      only one of the two sources is generating data.

      I've implemented a test case for this, you can find it here: https://github.com/rmetzger/flink/blob/para_checkpoint_bug/flink-tests/src/test/java/org/apache/flink/test/checkpointing/ParallelismChangeCheckpoinedITCase.java

      Attachments

        Issue Links

        1.
        Refactor the CheckpointCoordinator to compute the tasks to trigger/wait/commit dynamically Sub-task Closed Yun Gao Actions
        2.
        Modify the logic of computing tasks to trigger/wait/commit to consider finished tasks Sub-task Closed Yun Gao Actions
        3.
        Stores finished status for fully finished operators Sub-task Closed Yun Gao Actions
        4.
        Decline Checkpoint if some tasks finished before get triggered Sub-task Closed Dawid Wysakowicz Actions
        5.
        Allows tasks to report finish state Sub-task Closed Yun Gao Actions
        6.
        Handle UnionListState with finished operators Sub-task Closed Yun Gao Actions
        7.
        Handle BroadcastState with finished operators Sub-task Closed Yun Gao Actions
        8.
        Refactor StreamTask hierarchy to support triggering checkpoint via RPC for non-source tasks Sub-task Closed Yun Gao Actions
        9.
        Support checkpoint alignment with finished input channels Sub-task Closed Yun Gao Actions
        10.
        StreamTask waits for all the asynchronous step of pending checkpoint to finish Sub-task Closed Yun Gao Actions
        11.
        CheckpointCoordinator pass the flag about whether a operator is fully finished on recovery Sub-task Closed Yun Gao Actions
        12.
        Skip the execution of the fully finished operators after recovery Sub-task Closed Yun Gao Actions
        13.
        Re-compute tasks to trigger when tasks get triggered before finished Sub-task Closed Unassigned Actions
        14.
        Check for illegal modifications of JobGraph with finished operators Sub-task Closed Dawid Wysakowicz Actions
        15.
        Check for illegal modifications of JobGraph with partially finished operators Sub-task Closed Yun Gao Actions
        16.
        Deprecate/Remove StreamOperator#dispose method Sub-task Closed Dawid Wysakowicz Actions
        17.
        Add finish method to the SinkFunction Sub-task Closed Dawid Wysakowicz Actions
        18.
        Call StreamOperator#finish on EndOfData Sub-task Closed Dawid Wysakowicz Actions
        19.
        Wait for a checkpoint completed after finishing a task Sub-task Closed Dawid Wysakowicz Actions
        20.
        Add a global flag for enabling/disabling final checkpoints Sub-task Closed Dawid Wysakowicz Actions
        21.
        Do not create transaction in TwoPhaseCommitSinkFunction after finish() Sub-task Closed Yuan Mei Actions
        22.
        Trigger global failover for synchronous savepoints on CheckpointCoordinator Sub-task Closed Unassigned Actions
        23.
        Pass watermarks in finishedOnRestore tasks Sub-task Closed Piotr Nowojski Actions
        24.
        When task is finishedOnRestore Operators shouldn't be used Sub-task Closed Piotr Nowojski Actions
        25.
        notifyCheckpointComplete without corresponding snapshotState Sub-task Closed Yun Gao Actions
        26.
        FLIP-147 Checkpoint N has not been started at all Sub-task Closed Yun Gao Actions
        27.
        Bypass operators when advanceToEndOfEventTime for both legacy and new source tasks Sub-task Closed Yun Gao Actions
        28.
        FLIP-147: Unable to recover after source fully finished Sub-task Closed Yun Gao Actions
        29.
        Take snapshots for operator coordinators which all corresponding tasks finished Sub-task Closed Yun Gao Actions
        30.
        Fix the concurrent problem between triggering savepoint with drain and finish Sub-task Closed Yun Gao Actions
        31.
        Double check if non-keyed FullyFinishedOperatorState can be mixed up with non finished OperatorState on recovery Sub-task Closed Unassigned Actions
        32.
        Do not resume channels if the barrier is received via RPC Sub-task Closed Yun Gao Actions
        33.
        Respect the finished flag when extracting operator states due to skip in-flight data Sub-task Closed Yun Gao Actions
        34.
        Upgrade the TwoPhaseCommitSink to support empty transaction after finished Sub-task Closed Yun Gao Actions
        35.
        CheckpointBarrierHandler may skip the markAlignmentStart for alignment -with-timeout checkpoint Sub-task Closed Yun Gao Actions
        36.
        Verify the final checkpoint and stop-with-savepoint --drain Sub-task Closed Yangze Guo Actions
        37.
        Add E2E/ITCase test for checkpoints after tasks finished Sub-task Closed Roman Khachatryan Actions
        38.
        Document FLIP-147 capabiliites and limitations Sub-task Closed Dawid Wysakowicz Actions
        39.
        Log improvement for aborting checkpoint due to tasks are finishing Sub-task Closed Yun Gao Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            gaoyunhaii Yun Gao
            rmetzger Robert Metzger
            Votes:
            17 Vote for this issue
            Watchers:
            57 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment