Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5073

ZooKeeperCompleteCheckpointStore executes blocking delete operation in ZooKeeper client thread

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.1.3, 1.2.0
    • 1.1.4, 1.2.0
    • Runtime / Coordination
    • None

    Description

      When deleting completed checkpoints from the ZooKeeperCompletedCheckpointStore, one first tries to delete the meta state handle from ZooKeeper and then deletes the actual checkpoint in a callback from the delete operation. This callback is executed by the ZooKeeper client's main thread which is problematic, because it blocks the ZooKeeper client. If a delete operation takes longer than it takes to complete a checkpoint, then it might even happen that delete operations of outdated checkpoints are piling up because they are effectively executed sequentially.

      I propose to execute the delete operations by a dedicated Executor so that we keep the client's main thread free to do ZooKeeper related work.

      Attachments

        Activity

          People

            trohrmann Till Rohrmann
            trohrmann Till Rohrmann
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: