Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-13698

Rework threading model of CheckpointCoordinator

    XMLWordPrintableJSON

Details

    Description

      Currently CheckpointCoordinator and CheckpointFailureManager code is executed by multiple different threads (mostly ioExecutor, but not only). It's causing multiple concurrency issues, for example: https://issues.apache.org/jira/browse/FLINK-13497

      Proper fix would be to rethink threading model there. At first glance it doesn't seem that this code should be multi threaded, except of parts doing the actual IO operations, so it should be possible to run everything in one single ExecutionGraph's thread and just run asynchronously necessary IO operations with some feedback loop ("mailbox style").

      I would strongly recommend fixing this issue before adding new features in the CheckpointCoordinator component.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pnowojski Piotr Nowojski
              Votes:
              1 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m