Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-6612

ZooKeeperStateHandleStore does not guard against concurrent delete operations

    Details

      Description

      The ZooKeeperStateHandleStore does not guard against concurrent delete operations which could happen in case of a lost leadership and a new leadership grant. The problem is that checkpoint nodes can get deleted even after they have been recovered by another ZooKeeperCompletedCheckpointStore. This corrupts the recovered checkpoint and thwarts future recoveries.

      I propose to add reference counting to the ZooKeeperStateHandleStore. That way, we can monitor how many concurrent processes have a hold on a given checkpoint node. Only if the reference count reaches 0, we are allowed to delete the checkpoint node and dispose the checkpoint data.

      Stephan proposed to use ephemeral child nodes to track the reference count of a checkpoint node. That way we are sure that locks on the a checkpoint node are released in case of JobManager failures.

        Issue Links

          Activity

          Hide
          till.rohrmann Till Rohrmann added a comment -

          1.4.0: 3d119e1155aa8930cc7b18a085d6790cb2c63b70
          1.3.0: f58fec70fef12056bd58b6cc2985532ccb07625e

          Show
          till.rohrmann Till Rohrmann added a comment - 1.4.0: 3d119e1155aa8930cc7b18a085d6790cb2c63b70 1.3.0: f58fec70fef12056bd58b6cc2985532ccb07625e
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann closed the pull request at:

          https://github.com/apache/flink/pull/3940

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann closed the pull request at: https://github.com/apache/flink/pull/3940
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann commented on the issue:

          https://github.com/apache/flink/pull/3940

          Merged in f58fec70fef12056bd58b6cc2985532ccb07625e

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/3940 Merged in f58fec70fef12056bd58b6cc2985532ccb07625e
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann closed the pull request at:

          https://github.com/apache/flink/pull/3939

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann closed the pull request at: https://github.com/apache/flink/pull/3939
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user tillrohrmann commented on the issue:

          https://github.com/apache/flink/pull/3939

          Merged in 3d119e1155aa8930cc7b18a085d6790cb2c63b70

          Show
          githubbot ASF GitHub Bot added a comment - Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/3939 Merged in 3d119e1155aa8930cc7b18a085d6790cb2c63b70
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on the issue:

          https://github.com/apache/flink/pull/3939

          Thanks for your work @tillrohrmann ! I merged this manually. Please close this PR and the jira.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/3939 Thanks for your work @tillrohrmann ! I merged this manually. Please close this PR and the jira.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StefanRRichter commented on the issue:

          https://github.com/apache/flink/pull/3939

          LGMT +1. Tested that this is running with incremental checkpoints, however the test did not (yet) cover a "split brain" scenario with to JobManagers running in parallel.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StefanRRichter commented on the issue: https://github.com/apache/flink/pull/3939 LGMT +1. Tested that this is running with incremental checkpoints, however the test did not (yet) cover a "split brain" scenario with to JobManagers running in parallel.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user tillrohrmann opened a pull request:

          https://github.com/apache/flink/pull/3940

          [backport 1.3] FLINK-6612 Allow ZooKeeperStateHandleStore to lock created ZNodes

          Backport of #3939 onto the `release-1.3` branch.

          In order to guard against deletions of ZooKeeper nodes which are still being used
          by a different ZooKeeperStateHandleStore, we have to introduce a locking mechanism.
          Only after all ZooKeeperStateHandleStores have released their lock, the ZNode is
          allowed to be deleted.

          THe locking mechanism is implemented via ephemeral child nodes of the respective
          ZooKeeper node. Whenever a ZooKeeperStateHandleStore wants to lock a ZNode, thus,
          protecting it from being deleted, it creates an ephemeral child node. The node's
          name is unique to the ZooKeeperStateHandleStore instance. The delete operations
          will then only delete the node if it does not have any children associated.

          In order to guard against oprhaned lock nodes, they are created as ephemeral nodes.
          This means that they will be deleted by ZooKeeper once the connection of the
          ZooKeeper client which created the node timed out.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/tillrohrmann/flink addZooKeeperRefCountingBackport

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3940.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3940


          commit ccb3ac8f2b64a357275d2967fcd440947becd272
          Author: Till Rohrmann <trohrmann@apache.org>
          Date: 2017-05-17T12:52:04Z

          FLINK-6612 Allow ZooKeeperStateHandleStore to lock created ZNodes

          In order to guard against deletions of ZooKeeper nodes which are still being used
          by a different ZooKeeperStateHandleStore, we have to introduce a locking mechanism.
          Only after all ZooKeeperStateHandleStores have released their lock, the ZNode is
          allowed to be deleted.

          THe locking mechanism is implemented via ephemeral child nodes of the respective
          ZooKeeper node. Whenever a ZooKeeperStateHandleStore wants to lock a ZNode, thus,
          protecting it from being deleted, it creates an ephemeral child node. The node's
          name is unique to the ZooKeeperStateHandleStore instance. The delete operations
          will then only delete the node if it does not have any children associated.

          In order to guard against oprhaned lock nodes, they are created as ephemeral nodes.
          This means that they will be deleted by ZooKeeper once the connection of the
          ZooKeeper client which created the node timed out.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/3940 [backport 1.3] FLINK-6612 Allow ZooKeeperStateHandleStore to lock created ZNodes Backport of #3939 onto the `release-1.3` branch. In order to guard against deletions of ZooKeeper nodes which are still being used by a different ZooKeeperStateHandleStore, we have to introduce a locking mechanism. Only after all ZooKeeperStateHandleStores have released their lock, the ZNode is allowed to be deleted. THe locking mechanism is implemented via ephemeral child nodes of the respective ZooKeeper node. Whenever a ZooKeeperStateHandleStore wants to lock a ZNode, thus, protecting it from being deleted, it creates an ephemeral child node. The node's name is unique to the ZooKeeperStateHandleStore instance. The delete operations will then only delete the node if it does not have any children associated. In order to guard against oprhaned lock nodes, they are created as ephemeral nodes. This means that they will be deleted by ZooKeeper once the connection of the ZooKeeper client which created the node timed out. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink addZooKeeperRefCountingBackport Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3940.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3940 commit ccb3ac8f2b64a357275d2967fcd440947becd272 Author: Till Rohrmann <trohrmann@apache.org> Date: 2017-05-17T12:52:04Z FLINK-6612 Allow ZooKeeperStateHandleStore to lock created ZNodes In order to guard against deletions of ZooKeeper nodes which are still being used by a different ZooKeeperStateHandleStore, we have to introduce a locking mechanism. Only after all ZooKeeperStateHandleStores have released their lock, the ZNode is allowed to be deleted. THe locking mechanism is implemented via ephemeral child nodes of the respective ZooKeeper node. Whenever a ZooKeeperStateHandleStore wants to lock a ZNode, thus, protecting it from being deleted, it creates an ephemeral child node. The node's name is unique to the ZooKeeperStateHandleStore instance. The delete operations will then only delete the node if it does not have any children associated. In order to guard against oprhaned lock nodes, they are created as ephemeral nodes. This means that they will be deleted by ZooKeeper once the connection of the ZooKeeper client which created the node timed out.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user tillrohrmann opened a pull request:

          https://github.com/apache/flink/pull/3939

          FLINK-6612 Allow ZooKeeperStateHandleStore to lock created ZNodes

          In order to guard against deletions of ZooKeeper nodes which are still being used
          by a different ZooKeeperStateHandleStore, we have to introduce a locking mechanism.
          Only after all ZooKeeperStateHandleStores have released their lock, the ZNode is
          allowed to be deleted.

          THe locking mechanism is implemented via ephemeral child nodes of the respective
          ZooKeeper node. Whenever a ZooKeeperStateHandleStore wants to lock a ZNode, thus,
          protecting it from being deleted, it creates an ephemeral child node. The node's
          name is unique to the ZooKeeperStateHandleStore instance. The delete operations
          will then only delete the node if it does not have any children associated.

          In order to guard against oprhaned lock nodes, they are created as ephemeral nodes.
          This means that they will be deleted by ZooKeeper once the connection of the
          ZooKeeper client which created the node timed out.

          cc @StefanRRichter

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/tillrohrmann/flink addZooKeeperRefCounting

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3939.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3939


          commit def077a8d95921645733169d420d548842dde257
          Author: Till Rohrmann <trohrmann@apache.org>
          Date: 2017-05-17T12:52:04Z

          FLINK-6612 Allow ZooKeeperStateHandleStore to lock created ZNodes

          In order to guard against deletions of ZooKeeper nodes which are still being used
          by a different ZooKeeperStateHandleStore, we have to introduce a locking mechanism.
          Only after all ZooKeeperStateHandleStores have released their lock, the ZNode is
          allowed to be deleted.

          THe locking mechanism is implemented via ephemeral child nodes of the respective
          ZooKeeper node. Whenever a ZooKeeperStateHandleStore wants to lock a ZNode, thus,
          protecting it from being deleted, it creates an ephemeral child node. The node's
          name is unique to the ZooKeeperStateHandleStore instance. The delete operations
          will then only delete the node if it does not have any children associated.

          In order to guard against oprhaned lock nodes, they are created as ephemeral nodes.
          This means that they will be deleted by ZooKeeper once the connection of the
          ZooKeeper client which created the node timed out.


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/3939 FLINK-6612 Allow ZooKeeperStateHandleStore to lock created ZNodes In order to guard against deletions of ZooKeeper nodes which are still being used by a different ZooKeeperStateHandleStore, we have to introduce a locking mechanism. Only after all ZooKeeperStateHandleStores have released their lock, the ZNode is allowed to be deleted. THe locking mechanism is implemented via ephemeral child nodes of the respective ZooKeeper node. Whenever a ZooKeeperStateHandleStore wants to lock a ZNode, thus, protecting it from being deleted, it creates an ephemeral child node. The node's name is unique to the ZooKeeperStateHandleStore instance. The delete operations will then only delete the node if it does not have any children associated. In order to guard against oprhaned lock nodes, they are created as ephemeral nodes. This means that they will be deleted by ZooKeeper once the connection of the ZooKeeper client which created the node timed out. cc @StefanRRichter You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink addZooKeeperRefCounting Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3939.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3939 commit def077a8d95921645733169d420d548842dde257 Author: Till Rohrmann <trohrmann@apache.org> Date: 2017-05-17T12:52:04Z FLINK-6612 Allow ZooKeeperStateHandleStore to lock created ZNodes In order to guard against deletions of ZooKeeper nodes which are still being used by a different ZooKeeperStateHandleStore, we have to introduce a locking mechanism. Only after all ZooKeeperStateHandleStores have released their lock, the ZNode is allowed to be deleted. THe locking mechanism is implemented via ephemeral child nodes of the respective ZooKeeper node. Whenever a ZooKeeperStateHandleStore wants to lock a ZNode, thus, protecting it from being deleted, it creates an ephemeral child node. The node's name is unique to the ZooKeeperStateHandleStore instance. The delete operations will then only delete the node if it does not have any children associated. In order to guard against oprhaned lock nodes, they are created as ephemeral nodes. This means that they will be deleted by ZooKeeper once the connection of the ZooKeeper client which created the node timed out.

            People

            • Assignee:
              till.rohrmann Till Rohrmann
              Reporter:
              till.rohrmann Till Rohrmann
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development