Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-569

Failure of elected leader can lead to never-ending leader election

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.3.0
    • None
    • None

    Description

      It is possible for basic LeaderElection to enter a situation where it never terminates.

      As an example, consider a three node cluster A, B and C.

      1. In the first round, A votes for A, B votes for B and C votes for C
      2. Since C > B > A, all nodes resolve to vote for C in the second round as there is no first round winner
      3. A, B vote for C, but C fails.
      4. C is not elected because neither A nor B hear from it, and so votes for it are discarded
      5. A and B never reset their votes, despite not hearing from C, so continue to vote for it ad infinitum.

      Step 5 is the bug. If A and B reset their votes to themselves in the case where the heard-from vote set is empty, leader election will continue.

      I do not know if this affects running ZK clusters, as it is possible that the out-of-band failure detection protocols may cause leader election to be restarted anyhow, but I've certainly seen this in tests.

      I have a trivial patch which fixes it, but it needs a test (and tests for race conditions are hard to write!)

      Attachments

        1. zookeeper-569.patch
          7 kB
          Henry Robinson
        2. zookeeper-569.patch
          7 kB
          Henry Robinson
        3. zookeeper-569.patch
          15 kB
          Henry Robinson
        4. zookeeper-569.patch
          15 kB
          Henry Robinson
        5. ZOOKEEPER-569.patch
          14 kB
          Flavio Paiva Junqueira
        6. zookeeper-569.patch
          15 kB
          Henry Robinson

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            henryr Henry Robinson
            henryr Henry Robinson
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment