Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5073

ZooKeeperCompleteCheckpointStore executes blocking delete operation in ZooKeeper client thread

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.2.0, 1.1.3
    • Fix Version/s: 1.2.0, 1.1.4
    • Labels:
      None

      Description

      When deleting completed checkpoints from the ZooKeeperCompletedCheckpointStore, one first tries to delete the meta state handle from ZooKeeper and then deletes the actual checkpoint in a callback from the delete operation. This callback is executed by the ZooKeeper client's main thread which is problematic, because it blocks the ZooKeeper client. If a delete operation takes longer than it takes to complete a checkpoint, then it might even happen that delete operations of outdated checkpoints are piling up because they are effectively executed sequentially.

      I propose to execute the delete operations by a dedicated Executor so that we keep the client's main thread free to do ZooKeeper related work.

        Attachments

          Activity

            People

            • Assignee:
              till.rohrmann Till Rohrmann
              Reporter:
              till.rohrmann Till Rohrmann
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: