Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-27848

ZooKeeperLeaderElectionDriver keeps writing leader information, using up zxid

Details

    Description

      After a leadership change, the new leader may keeps writing its information (which is identical) to ZK, causing the zxid on ZK quickly used up.

      The problem is that, in ZooKeeperLeaderElectionDriver#retrieveLeaderInformationFromZooKeeper, leaderElectionEventHandler.onLeaderInformationChange(LeaderInformation.empty()) is called no matter childData is null or not. In case of non-null, this will cause the driver keeps re-writing the leader information to ZK.

      The problem was introduced in FLINK-24038, and only affects the legacy ZooKeeperHaServices. Thus, only 1.15 are affected.

      Attachments

        Activity

          reswqa reswqa added a comment -

          Hi xtsong , I have prepared a pull request for this ticket, can you help to review it?

          reswqa reswqa added a comment - Hi xtsong , I have prepared a pull request for this ticket, can you help to review it?
          xtsong Xintong Song added a comment -

          Fixed in:

          • release-1.15: 73a33ab5f25dd5ac7e8adb1521c092a8aedcc736
          xtsong Xintong Song added a comment - Fixed in: release-1.15: 73a33ab5f25dd5ac7e8adb1521c092a8aedcc736
          mapohl Matthias Pohl added a comment - - edited

          I'm reopening this issue to provide forward(?)ports for 1.16 and 1.17.

          Refactoring the leader election for FLIP-285/FLINK-26522 is kind of tricky. I'm trying to slice the code changes into meaningful commits (and ideally dedicated PRs) to make the review process easier.

          I ran into this issue when refactoring the code and merging classes into one which also required adapting tests. This revealed the inconsistency/bug in the ZooKeeperLeaderElectionDriver implementation. Merging the bugfixes into 1.17 and 1.16 makes the other changes more reasonable/consistent.

          More specifically, this bug was revealed in ZooKeeperLeaderElectionTest.testLeaderShouldBeCorrectedWhenOverwritten when changing from the deprecated NodeCache to CuratorCache. The new CuratorCacheListener allows to be more selective on whether we expect a node creation or change which causes a test failure. The previous test implementation worked because we sent the 2nd write operation after writing the leaderinformation which caused a node-change event and, after all, made the test pass.

          mapohl Matthias Pohl added a comment - - edited I'm reopening this issue to provide forward(?)ports for 1.16 and 1.17. Refactoring the leader election for FLIP-285/ FLINK-26522 is kind of tricky. I'm trying to slice the code changes into meaningful commits (and ideally dedicated PRs) to make the review process easier. I ran into this issue when refactoring the code and merging classes into one which also required adapting tests. This revealed the inconsistency/bug in the ZooKeeperLeaderElectionDriver implementation. Merging the bugfixes into 1.17 and 1.16 makes the other changes more reasonable/consistent. More specifically, this bug was revealed in ZooKeeperLeaderElectionTest.testLeaderShouldBeCorrectedWhenOverwritten when changing from the deprecated NodeCache to CuratorCache . The new CuratorCacheListener allows to be more selective on whether we expect a node creation or change which causes a test failure. The previous test implementation worked because we sent the 2nd write operation after writing the leaderinformation which caused a node-change event and, after all, made the test pass.
          mapohl Matthias Pohl added a comment -

          master: d76214de330fea485f2971b51c172420dfdf500b
          1.16: 5441cef7b0310965822f4bca0f12bf63790c4d69
          1.15: 73a33ab5f25dd5ac7e8adb1521c092a8aedcc736 (copied over from Xintong's comment above)

          mapohl Matthias Pohl added a comment - master: d76214de330fea485f2971b51c172420dfdf500b 1.16: 5441cef7b0310965822f4bca0f12bf63790c4d69 1.15: 73a33ab5f25dd5ac7e8adb1521c092a8aedcc736 (copied over from Xintong's comment above)

          People

            mapohl Matthias Pohl
            xtsong Xintong Song
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: