ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-569

Failure of elected leader can lead to never-ending leader election

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.0
    • Component/s: None
    • Labels:
      None

      Description

      It is possible for basic LeaderElection to enter a situation where it never terminates.

      As an example, consider a three node cluster A, B and C.

      1. In the first round, A votes for A, B votes for B and C votes for C
      2. Since C > B > A, all nodes resolve to vote for C in the second round as there is no first round winner
      3. A, B vote for C, but C fails.
      4. C is not elected because neither A nor B hear from it, and so votes for it are discarded
      5. A and B never reset their votes, despite not hearing from C, so continue to vote for it ad infinitum.

      Step 5 is the bug. If A and B reset their votes to themselves in the case where the heard-from vote set is empty, leader election will continue.

      I do not know if this affects running ZK clusters, as it is possible that the out-of-band failure detection protocols may cause leader election to be restarted anyhow, but I've certainly seen this in tests.

      I have a trivial patch which fixes it, but it needs a test (and tests for race conditions are hard to write!)

      1. zookeeper-569.patch
        15 kB
        Henry Robinson
      2. ZOOKEEPER-569.patch
        14 kB
        Flavio Junqueira
      3. zookeeper-569.patch
        15 kB
        Henry Robinson
      4. zookeeper-569.patch
        15 kB
        Henry Robinson
      5. zookeeper-569.patch
        7 kB
        Henry Robinson
      6. zookeeper-569.patch
        7 kB
        Henry Robinson

        Issue Links

          Activity

          Hide
          Flavio Junqueira added a comment -

          One way to implement the test is to implement a mock server to force the particular message interleaving that triggers the bug. No claim it is the best way, but it seemed to be a good idea for FLELostMessageTest.

          Show
          Flavio Junqueira added a comment - One way to implement the test is to implement a mock server to force the particular message interleaving that triggers the bug. No claim it is the best way, but it seemed to be a good idea for FLELostMessageTest.
          Hide
          Henry Robinson added a comment -

          Here's a functional patch for this issue. Looking into making a mock test now - would like to use Mockito or similar, but will probably have to use hand-rolled mocks for now.

          Show
          Henry Robinson added a comment - Here's a functional patch for this issue. Looking into making a mock test now - would like to use Mockito or similar, but will probably have to use hand-rolled mocks for now.
          Hide
          Henry Robinson added a comment -

          Running this by Hudson to verify the build, despite not including new tests.

          (It will probably +1 if tests pass because I have slightly edited LETest)

          Show
          Henry Robinson added a comment - Running this by Hudson to verify the build, despite not including new tests. (It will probably +1 if tests pass because I have slightly edited LETest)
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12434553/zookeeper-569.patch
          against trunk revision 903483.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/61/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12434553/zookeeper-569.patch against trunk revision 903483. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/61/console This message is automatically generated.
          Hide
          Henry Robinson added a comment -

          Forgot --no-prefix from git. Hopefully this will apply.

          Show
          Henry Robinson added a comment - Forgot --no-prefix from git. Hopefully this will apply.
          Hide
          Patrick Hunt added a comment -

          Mockito looks good. If someone wants to include as a testing option please enter a JIRA/patch.

          Show
          Patrick Hunt added a comment - Mockito looks good. If someone wants to include as a testing option please enter a JIRA/patch.
          Hide
          Flavio Junqueira added a comment -

          Henry, I was taking a look at the patch, and I'm slightly confused about how it goes, so I was wondering if you could give me a hand on understanding it.

          It seems to me that in the situation you describe, heardFrom won't be empty, so the checking for heardFrom == 0 wouldn't work. Instead, I think you have to call countVotes and check if there is any vote left after it returns, no?

          Show
          Flavio Junqueira added a comment - Henry, I was taking a look at the patch, and I'm slightly confused about how it goes, so I was wondering if you could give me a hand on understanding it. It seems to me that in the situation you describe, heardFrom won't be empty, so the checking for heardFrom == 0 wouldn't work. Instead, I think you have to call countVotes and check if there is any vote left after it returns, no?
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12434555/zookeeper-569.patch
          against trunk revision 903483.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/62/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/62/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/62/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12434555/zookeeper-569.patch against trunk revision 903483. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/62/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/62/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/62/console This message is automatically generated.
          Hide
          Benjamin Reed added a comment -

          i'm also wondering about the heardFrom == 0. in your case A and B will still be up, so heardFrom will not be zero. don't you really want to check whether or not you heard from guy that you think is the leader?

          Show
          Benjamin Reed added a comment - i'm also wondering about the heardFrom == 0. in your case A and B will still be up, so heardFrom will not be zero. don't you really want to check whether or not you heard from guy that you think is the leader?
          Hide
          Henry Robinson added a comment -

          Yes, you're both right! I misread my own notes on the bug :/

          I'm writing tests for a real fix now. Thanks both for pointing this out.

          Show
          Henry Robinson added a comment - Yes, you're both right! I misread my own notes on the bug :/ I'm writing tests for a real fix now. Thanks both for pointing this out.
          Hide
          Henry Robinson added a comment -

          Here's a patch with tests that appears to fix the issue (test fails without fix, test succeeds with). All tests pass for me with this patch on my laptop.

          I have replaced one kludge with another here. QuorumPeer.electionAlg is set to null when electionType==0 until the election is actually run. This causes problems if you want to retrieve the electionAlg object via getElectionAlg() beforehand for tests.

          I've set it up so that makeLEStrategy always creates a new LeaderElection if electionType == 0, but also that createElectionAlgorithm sets electionAlg=new LeaderElection(this) instead of null, so that as long as startLeaderElection has been called, getElectionAlg() won't return null.

          I've checked to see if this will cause any obvious problems for the call sites of getElectionAlg and couldn't find anything that expected null. It seems more consistent to me this way. The question I have is over why LeaderElection needs re-instantiating each time when FLE does not.

          If this sounds confusing, it's because the code really is! The interaction of createElectionAlgorithm, startLeaderElection and makeLEStrategy is hard to discern.

          Show
          Henry Robinson added a comment - Here's a patch with tests that appears to fix the issue (test fails without fix, test succeeds with). All tests pass for me with this patch on my laptop. I have replaced one kludge with another here. QuorumPeer.electionAlg is set to null when electionType==0 until the election is actually run. This causes problems if you want to retrieve the electionAlg object via getElectionAlg() beforehand for tests. I've set it up so that makeLEStrategy always creates a new LeaderElection if electionType == 0, but also that createElectionAlgorithm sets electionAlg=new LeaderElection(this) instead of null, so that as long as startLeaderElection has been called, getElectionAlg() won't return null. I've checked to see if this will cause any obvious problems for the call sites of getElectionAlg and couldn't find anything that expected null. It seems more consistent to me this way. The question I have is over why LeaderElection needs re-instantiating each time when FLE does not. If this sounds confusing, it's because the code really is! The interaction of createElectionAlgorithm, startLeaderElection and makeLEStrategy is hard to discern.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12434729/zookeeper-569.patch
          against trunk revision 903483.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/65/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/65/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/65/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12434729/zookeeper-569.patch against trunk revision 903483. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/65/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/65/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/65/console This message is automatically generated.
          Hide
          Flavio Junqueira added a comment -

          Thanks, Henry, it looks good. I agree with your comment on the confusion between LE between instantiated every time it is used, and FLE behaving differently. We should really just have one model.

          One comment on the patch is that I don't think you need to instantiate QuorumCnxManager in mockServer() on the new test. The conditional block that checks the listener can also be removed.

          Show
          Flavio Junqueira added a comment - Thanks, Henry, it looks good. I agree with your comment on the confusion between LE between instantiated every time it is used, and FLE behaving differently. We should really just have one model. One comment on the patch is that I don't think you need to instantiate QuorumCnxManager in mockServer() on the new test. The conditional block that checks the listener can also be removed.
          Hide
          Mahadev konar added a comment -

          henry, are you working on a new patch adressing flavio's comments?

          Show
          Mahadev konar added a comment - henry, are you working on a new patch adressing flavio's comments?
          Hide
          Henry Robinson added a comment -

          Yes, hoping to get it out this week.

          Show
          Henry Robinson added a comment - Yes, hoping to get it out this week.
          Hide
          Henry Robinson added a comment -

          Slightly updated patch - Flavio, I believe you do need to instantiate QuorumCnxManager as QCM.Listener is not static and therefore can't be instantiated without an enclosing instance. I have removed the check for listener == null and set the thread timeout to 15s to tally with LETest.java.

          Show
          Henry Robinson added a comment - Slightly updated patch - Flavio, I believe you do need to instantiate QuorumCnxManager as QCM.Listener is not static and therefore can't be instantiated without an enclosing instance. I have removed the check for listener == null and set the thread timeout to 15s to tally with LETest.java.
          Hide
          Flavio Junqueira added a comment -

          Henry, you do need an instance of QCM to get the listener, but you don't need the listener because you're using LE.

          I'm uploading a patch in which I have commented out some lines of the new test you wrote. This is how I think it should look like. If you agree, please remove those lines and submit a new patch.

          Note that I have also changed the leader election parameter in the new instance of QuorumPeer in mockServer to 0.

          Show
          Flavio Junqueira added a comment - Henry, you do need an instance of QCM to get the listener, but you don't need the listener because you're using LE. I'm uploading a patch in which I have commented out some lines of the new test you wrote. This is how I think it should look like. If you agree, please remove those lines and submit a new patch. Note that I have also changed the leader election parameter in the new instance of QuorumPeer in mockServer to 0.
          Hide
          Henry Robinson added a comment -

          I'm making really heavy weather of this Flavio, thanks, you're completely right. You also exposed a weird issue that the listener was actually causing the test to succeed no matter what messages were sent - but if the fix for this issue was not in place the test would fail. This is fixed by using the right port for the DatagramSocket.

          New patch hopefully has the DatagramSocket listening on the right port, with everything else in order.

          Show
          Henry Robinson added a comment - I'm making really heavy weather of this Flavio, thanks, you're completely right. You also exposed a weird issue that the listener was actually causing the test to succeed no matter what messages were sent - but if the fix for this issue was not in place the test would fail. This is fixed by using the right port for the DatagramSocket. New patch hopefully has the DatagramSocket listening on the right port, with everything else in order.
          Hide
          Patrick Hunt added a comment -

          Henry, there are two patches, please highlight which one the review should review. thx

          Show
          Patrick Hunt added a comment - Henry, there are two patches, please highlight which one the review should review. thx
          Hide
          Henry Robinson added a comment -

          The most recent patch I submitted is the right patch - it includes Flavio's suggestions.

          Show
          Henry Robinson added a comment - The most recent patch I submitted is the right patch - it includes Flavio's suggestions.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12435629/zookeeper-569.patch
          against trunk revision 912052.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/68/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/68/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/68/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435629/zookeeper-569.patch against trunk revision 912052. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/68/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/68/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h7.grid.sp2.yahoo.net/68/console This message is automatically generated.
          Hide
          Flavio Junqueira added a comment -

          +1, thanks Henry! I have just committed it (Committed revision 912119).

          Show
          Flavio Junqueira added a comment - +1, thanks Henry! I have just committed it (Committed revision 912119).
          Hide
          Hudson added a comment -

          Integrated in ZooKeeper-trunk #703 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/703/)
          . Failure of elected leader can lead to never-ending leader
          election (henry via flavio)

          Show
          Hudson added a comment - Integrated in ZooKeeper-trunk #703 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/703/ ) . Failure of elected leader can lead to never-ending leader election (henry via flavio)

            People

            • Assignee:
              Henry Robinson
              Reporter:
              Henry Robinson
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development