Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-13698

Rework threading model of CheckpointCoordinator

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Currently CheckpointCoordinator and CheckpointFailureManager code is executed by multiple different threads (mostly ioExecutor, but not only). It's causing multiple concurrency issues, for example: https://issues.apache.org/jira/browse/FLINK-13497

      Proper fix would be to rethink threading model there. At first glance it doesn't seem that this code should be multi threaded, except of parts doing the actual IO operations, so it should be possible to run everything in one single ExecutionGraph's thread and just run asynchronously necessary IO operations with some feedback loop ("mailbox style").

      I would strongly recommend fixing this issue before adding new features in the CheckpointCoordinator component.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            pnowojski Piotr Nowojski

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 20m
                1h 20m

                Slack

                  Issue deployment