Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-1005

Zookeeper servers fail to elect a leader succesfully.

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.2.2
    • Fix Version/s: 3.5.4, 3.6.0
    • Component/s: quorum
    • Labels:
      None
    • Environment:

      zookeeper-3.2.2; debian

      Description

      We were running 3 zookeeper servers, and simulated a failure on one of the servers.

      The one zookeeper node follows the other, but has trouble connecting. It looks like the following exception is the cause:

      2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] --  [org.apache.zookeeper.server.quorum.QuorumPeer] FOLLOWING
      2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] --  [org.apache.zookeeper.server.ZooKeeperServer] Created server
      2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] --  [org.apache.zookeeper.server.quorum.Follower] Following zookeeper3/192.168.131.11:2888
      2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING [zookeeper] --  [org.apache.zookeeper.server.quorum.Follower] Unexpected exception, tries=0
      2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING java.net.ConnectException: --  Connection refused
      2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING  --     at java.net.PlainSocketImpl.socketConnect(Native Method)
      2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING  --     at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310)
      2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING  --     at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176)
      2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING  --     at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163)
      2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING  --     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
      2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING  --     at java.net.Socket.connect(Socket.java:546)
      2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING  --     at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:156)
      2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING  --     at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:549)
      

      The last exception while connecting was:

      2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR [zookeeper] --  [org.apache.zookeeper.server.quorum.Follower] Unexpected exception
      2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR java.net.ConnectException: --  Connection refused
      2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR  --     at java.net.PlainSocketImpl.socketConnect(Native Method)
      2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR  --     at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310)
      2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR  --     at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176)
      2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR  --     at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163)
      2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR  --     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
      2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR  --     at java.net.Socket.connect(Socket.java:546)
      2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR  --     at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:156)
      2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR  --     at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:549)
      2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 WARNING [zookeeper] --  [org.apache.zookeeper.server.quorum.Follower] Exception when following the leader
      

      The leader started leading a bit later

      2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --  [org.apache.zookeeper.server.quorum.FastLeaderElection] Notification: 0, 94489312534, 25, 2, LOOKING, LOOKING, 0
      2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --  [org.apache.zookeeper.server.quorum.FastLeaderElection] Adding vote
      2011-03-01T14:02:32+02:00 e0-cb-4e-65-4d-7d WARNING [zookeeper] --  [org.apache.zookeeper.server.quorum.QuorumCnxManager] Cannot open channel to 1 at election address zookeeper2/192.168.132.10:3888
      2011-03-01T14:02:32+02:00 e0-cb-4e-65-4d-7d WARNING  --     at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
      2011-03-01T14:02:50+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --  [org.apache.zookeeper.server.quorum.QuorumPeer] LEADING
      

      But at that time the follower had already terminated and started a new election, so the leader failed:

      2011-03-01T14:02:50+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --  [org.apache.zookeeper.server.ZooKeeperServer] Created server
      2011-03-01T14:02:50+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --  [org.apache.zookeeper.server.persistence.FileSnap] Reading snapshot /var/lib/zookeeper/version-2/snapshot.1600007d16
      2011-03-01T14:02:50+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --  [org.apache.zookeeper.server.persistence.FileTxnSnapLog] Snapshotting: 1600007d16
      2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING [zookeeper] --  [org.apache.zookeeper.server.quorum.QuorumCnxManager] Cannot open channel to 1 at election address zookeeper2/192.168.132.10:3888
      2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING  --     at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
      2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING  --     at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:302)
      2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING  --     at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:323)
      2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING  --     at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:296)
      2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --  [org.apache.zookeeper.server.quorum.FastLeaderElection] Sending new notification.
      2011-03-01T14:03:11+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --  [org.apache.zookeeper.server.quorum.FastLeaderElection] Sending new notification.
      2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING [zookeeper] --  [org.apache.zookeeper.server.quorum.QuorumCnxManager] Cannot open channel to 1 at election address zookeeper2/192.168.132.10:3888
      2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING  --     at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
      2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING  --     at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:302)
      2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING  --     at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:323)
      2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING  --     at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:296)
      2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --  [org.apache.zookeeper.server.quorum.FastLeaderElection] Sending new notification.
      2011-03-01T14:03:32+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --  [org.apache.zookeeper.server.quorum.FastLeaderElection] Sending new notification.
      2011-03-01T14:03:34+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] --  [org.apache.zookeeper.server.quorum.Leader] Shutdown called
      2011-03-01T14:03:34+02:00 e0-cb-4e-65-4d-7d INFO  --     at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:371)
      2011-03-01T14:03:34+02:00 e0-cb-4e-65-4d-7d INFO  --     at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:297)
      2011-03-01T14:03:34+02:00 e0-cb-4e-65-4d-7d INFO  --     at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:562)
      

      From http://zookeeper.apache.org/doc/r3.2.2/zookeeperStarted.html:

      The new entry, initLimit is timeouts ZooKeeper uses to limit the length of time the ZooKeeper servers in quorum have to connect to a leader

      Since we have initLimit=10 and tickTime=4000, we should have 40 seconds for a zookeeper server to contact the leader.

      However, in the source code src/java/main/org/apache/zookeeper/server/quorum/Follower.java:

      152                 for (int tries = 0; tries < 5; tries++) {
      153                     try {
      154                         //sock = new Socket();
      155                         //sock.setSoTimeout(self.tickTime * self.initLimit);
      156                         sock.connect(addr, self.tickTime * self.syncLimit);
      157                         sock.setTcpNoDelay(nodelay);
      158                         break;
      159                     } catch (IOException e) {
      160                         if (tries == 4) {
      161                             LOG.error("Unexpected exception",e);
      162                             throw e;
      163                         } else {
      164                             LOG.warn("Unexpected exception, tries="+tries,e);
      165                             sock = new Socket();
      166                             sock.setSoTimeout(self.tickTime * self.initLimit);
      167                         }
      168                     }
      169                     Thread.sleep(1000);
      170                 }
      

      It appears as if we only have 4 seconds to contact the leader. The timeouts are applied to the socket, but do not take into account that the zookeeper leader may not have started its zookeeper service yet.

      Is this the expected behaviour? Or is the expected behaviour that followers should always be able to connect to the leader?

        Activity

        Hide
        ahardy Alexandre Hardy added a comment -

        I should note that several other failures were also simulated, with nodes replaced from time to time. But this particular issue seems to have prevented the zookeeper service from becoming available for a period of 2 hours. After 2 hours the quorum was reestablished and zookeeper recovered gracefully (at first glance).

        Show
        ahardy Alexandre Hardy added a comment - I should note that several other failures were also simulated, with nodes replaced from time to time. But this particular issue seems to have prevented the zookeeper service from becoming available for a period of 2 hours. After 2 hours the quorum was reestablished and zookeeper recovered gracefully (at first glance).
        Hide
        vishalmlst Vishal Kher added a comment -

        Hi Alexandre,

        Can you attach the zookeeper logs and zoo.cfg files? Also, please describe your test in detail.

        I am not 100% sure whether it is a good idea to wait for 40 seconds as you suggested. Heres a scenario to think about:
        Assume that we have a slow follower instead of a slow leader. Before the follower enters the for loop in question, there is a change in leader. Now, when the follower attempts to connect to the peer that it thinks is a leader (the old leader), it will fail with the same error as you are seeing. In this case, it is not worth asking the follower to wait for 40 seconds in that loop. Currently, ZK does not preempt a peer once it has exited leader election (and is either trying to be a leader or a follower).

        Note that there have been many fixes in leader election since ZK 3.2.2.

        Show
        vishalmlst Vishal Kher added a comment - Hi Alexandre, Can you attach the zookeeper logs and zoo.cfg files? Also, please describe your test in detail. I am not 100% sure whether it is a good idea to wait for 40 seconds as you suggested. Heres a scenario to think about: Assume that we have a slow follower instead of a slow leader. Before the follower enters the for loop in question, there is a change in leader. Now, when the follower attempts to connect to the peer that it thinks is a leader (the old leader), it will fail with the same error as you are seeing. In this case, it is not worth asking the follower to wait for 40 seconds in that loop. Currently, ZK does not preempt a peer once it has exited leader election (and is either trying to be a leader or a follower). Note that there have been many fixes in leader election since ZK 3.2.2.
        Hide
        fpj Flavio Junqueira added a comment -

        Hi Alexandre, We do not assume that a follower must be able to connect to an elected leader. Since a process might crash at any time, there is the possibility that a process crashes right after being elected, so we don't want a follower waiting indefinitely on such leader.

        I don't understand how this problem of a follower timing out before the leader exercises its role leads to a 2-hour delay. If a connection from a leader to a follower times out, then the follower goes back to leader election, and a new leader is supposed to arise in a few seconds.

        As Vishal pointed out, there has been some fixes to leader election and to the logic that connects a leader and it followers since 3.2.2.

        Show
        fpj Flavio Junqueira added a comment - Hi Alexandre, We do not assume that a follower must be able to connect to an elected leader. Since a process might crash at any time, there is the possibility that a process crashes right after being elected, so we don't want a follower waiting indefinitely on such leader. I don't understand how this problem of a follower timing out before the leader exercises its role leads to a 2-hour delay. If a connection from a leader to a follower times out, then the follower goes back to leader election, and a new leader is supposed to arise in a few seconds. As Vishal pointed out, there has been some fixes to leader election and to the logic that connects a leader and it followers since 3.2.2.
        Hide
        ahardy Alexandre Hardy added a comment -

        Hi Vishal,

        Can you attach the zookeeper logs and zoo.cfg files? Also, please describe your test in detail.

        Unfortunately our auto-installer process has reinstalled the nodes before I could get complete logs off the machines. I will attach the zoo.cfg shortly.

        I am not 100% sure whether it is a good idea to wait for 40 seconds as you suggested. Heres a scenario to think about:
        Assume that we have a slow follower instead of a slow leader. Before the follower enters the for loop in question, there is a change in leader. Now, when the follower attempts to connect to the peer that it thinks is a leader (the old leader), it will fail with the same error as you are seeing. In this case, it is not worth asking the follower to wait for 40 seconds in that loop. Currently, ZK does not preempt a peer once it has exited leader election (and is either trying to be a leader or a follower).

        I agree that the follower should not wait forever, and 40 seconds may be too long. But these settings can be configure in zoo.cfg. I also think that the fact that the leader takes so long to actually start the leader process is a problem (which should be addressed separately, and may be related to our particular failure simulation) and is a definite contributing factor to this issue.

        I am just surprised that the documentation indicates that follower should continue to attempt to connect to the leader for the time specified by initLimit, but that does not happen in this case. So the choice of initLimit is a worthy debate, but does not seem to change the fact that a follower should be trying to connect to leader until the initLimit time has elapsed (according to documentation). The timeout on the socket is set to self.tickTime * self.initLimit so this seems to indicate that the initLimit time is adhered to, but only in certain classes of failures.

        Note that there have been many fixes in leader election since ZK 3.2.2.

        We are committed to zookeeper 3.2.2 for the moment, since we do not have sufficient resources to test a later zookeeper in time for inclusion in our next product release. We will be implementing other workarounds for the moment, but it seemed like a good idea to raise the issue and see whether this is the desired behavior.

        Show
        ahardy Alexandre Hardy added a comment - Hi Vishal, Can you attach the zookeeper logs and zoo.cfg files? Also, please describe your test in detail. Unfortunately our auto-installer process has reinstalled the nodes before I could get complete logs off the machines. I will attach the zoo.cfg shortly. I am not 100% sure whether it is a good idea to wait for 40 seconds as you suggested. Heres a scenario to think about: Assume that we have a slow follower instead of a slow leader. Before the follower enters the for loop in question, there is a change in leader. Now, when the follower attempts to connect to the peer that it thinks is a leader (the old leader), it will fail with the same error as you are seeing. In this case, it is not worth asking the follower to wait for 40 seconds in that loop. Currently, ZK does not preempt a peer once it has exited leader election (and is either trying to be a leader or a follower). I agree that the follower should not wait forever, and 40 seconds may be too long. But these settings can be configure in zoo.cfg. I also think that the fact that the leader takes so long to actually start the leader process is a problem (which should be addressed separately, and may be related to our particular failure simulation) and is a definite contributing factor to this issue. I am just surprised that the documentation indicates that follower should continue to attempt to connect to the leader for the time specified by initLimit , but that does not happen in this case. So the choice of initLimit is a worthy debate, but does not seem to change the fact that a follower should be trying to connect to leader until the initLimit time has elapsed (according to documentation). The timeout on the socket is set to self.tickTime * self.initLimit so this seems to indicate that the initLimit time is adhered to, but only in certain classes of failures. Note that there have been many fixes in leader election since ZK 3.2.2. We are committed to zookeeper 3.2.2 for the moment, since we do not have sufficient resources to test a later zookeeper in time for inclusion in our next product release. We will be implementing other workarounds for the moment, but it seemed like a good idea to raise the issue and see whether this is the desired behavior.
        Hide
        ahardy Alexandre Hardy added a comment -

        Hi Flavio,

        Hi Alexandre, We do not assume that a follower must be able to connect to an elected leader. Since a process might crash at any time, there is the possibility that a process crashes right after being elected, so we don't want a follower waiting indefinitely on such leader.

        Yes, I agree with that. My question in this case is not whether the follower should be waiting, but rather how long should the follower wait. It seems to me that the follower may be waiting for a shorter period of time than the documentation seems to indicate it should. Please see my response to Vishal's comment for details.

        I don't understand how this problem of a follower timing out before the leader exercises its role leads to a 2-hour delay. If a connection from a leader to a follower times out, then the follower goes back to leader election, and a new leader is supposed to arise in a few seconds.

        Only two of the three zookeeper nodes were up. The follower did go back to leader election after which a new leader was elected with the same conditions as listed in the log entries above. So this cycle of election starts, leader chosen, follower fails to contact leader, leader terminates, election start ... repeated for a very long period of time until the leader eventually started its service in time for the follower to connect.

        I have no doubt that some other factor was influencing the leader, and preventing the leader from starting the leader service in a timely fashion. And yes, that has to be sorted out. I think we will be able to work around our problem, so an immediate fix is not the prime concern. At this point I am more interested in determining if this is the correct behavior.

        As Vishal pointed out, there has been some fixes to leader election and to the logic that connects a leader and it followers since 3.2.2.

        Unfortunately we are committed to zookeeper 3.2.2 for the moment, due to our product release dates and available resources.

        Show
        ahardy Alexandre Hardy added a comment - Hi Flavio, Hi Alexandre, We do not assume that a follower must be able to connect to an elected leader. Since a process might crash at any time, there is the possibility that a process crashes right after being elected, so we don't want a follower waiting indefinitely on such leader. Yes, I agree with that. My question in this case is not whether the follower should be waiting, but rather how long should the follower wait. It seems to me that the follower may be waiting for a shorter period of time than the documentation seems to indicate it should. Please see my response to Vishal's comment for details. I don't understand how this problem of a follower timing out before the leader exercises its role leads to a 2-hour delay. If a connection from a leader to a follower times out, then the follower goes back to leader election, and a new leader is supposed to arise in a few seconds. Only two of the three zookeeper nodes were up. The follower did go back to leader election after which a new leader was elected with the same conditions as listed in the log entries above. So this cycle of election starts, leader chosen, follower fails to contact leader, leader terminates, election start ... repeated for a very long period of time until the leader eventually started its service in time for the follower to connect. I have no doubt that some other factor was influencing the leader, and preventing the leader from starting the leader service in a timely fashion. And yes, that has to be sorted out. I think we will be able to work around our problem, so an immediate fix is not the prime concern. At this point I am more interested in determining if this is the correct behavior. As Vishal pointed out, there has been some fixes to leader election and to the logic that connects a leader and it followers since 3.2.2. Unfortunately we are committed to zookeeper 3.2.2 for the moment, due to our product release dates and available resources.
        Hide
        vishalmlst Vishal Kher added a comment -

        I agree that the follower should not wait forever, and 40 seconds may be too long. But these settings can be configure in zoo.cfg. I also think that the fact that the leader takes so long to actually start the leader process is a problem (which should be addressed separately, and may be related to our particular failure simulation) and is a definite contributing factor to this issue.

        I am just surprised that the documentation indicates that follower should continue to attempt to connect to the leader for the time specified by initLimit, but that does not happen in this case. So the choice of initLimit is a worthy debate, but does not seem to change the fact that a follower should be trying to connect to leader until the initLimit time has elapsed (according to documentation). The timeout on the socket is set to self.tickTime * self.initLimit so this seems to indicate that the initLimit time is adhered to, but only in certain classes of failures.

        Per my understanding, the leader waits for tickTime * initLimit and gives up leadership if a majority of followers do not join the leader in that time. So I think the text is written from leader's perspective. The follower does not wait for that amount of time, but keeps trying to run leader election and then joins the leader. In your case, the follower was not able to join the leader after 5 attempts. The follower will start new a leader election, but that will not result in a new leader election round at the leader. The leader will wait for 40 seconds before giving up leadership. In the meantime, the follower will complete the second round of election, elect the same leader and again try to join the leader. I would expect that this process should complete soon (at least within 40 seconds of leader wait time).

        Without logs it will be hard to pinpoint the problem. Can you also take a stack trace at the leader and the follower?

        Can you give details about your experiment (you mentioned you also replaced a node). Also, what is your SyncLimit? There have been some fixes in the code path in question (ZOOKEEPER-822 and ZOOKEEPER-900). Higher syncLimit values or failure of a node during leader election can cause significant delays without these fixes.

        Note that there have been many fixes in leader election since ZK 3.2.2.

        We are committed to zookeeper 3.2.2 for the moment, since we do not have sufficient resources to test a later zookeeper in time for inclusion in our next product release. We will be implementing other workarounds for the moment, but it seemed like a good idea to raise the issue and see whether this is the desired behavior.

        Show
        vishalmlst Vishal Kher added a comment - I agree that the follower should not wait forever, and 40 seconds may be too long. But these settings can be configure in zoo.cfg. I also think that the fact that the leader takes so long to actually start the leader process is a problem (which should be addressed separately, and may be related to our particular failure simulation) and is a definite contributing factor to this issue. I am just surprised that the documentation indicates that follower should continue to attempt to connect to the leader for the time specified by initLimit, but that does not happen in this case. So the choice of initLimit is a worthy debate, but does not seem to change the fact that a follower should be trying to connect to leader until the initLimit time has elapsed (according to documentation). The timeout on the socket is set to self.tickTime * self.initLimit so this seems to indicate that the initLimit time is adhered to, but only in certain classes of failures. Per my understanding, the leader waits for tickTime * initLimit and gives up leadership if a majority of followers do not join the leader in that time. So I think the text is written from leader's perspective. The follower does not wait for that amount of time, but keeps trying to run leader election and then joins the leader. In your case, the follower was not able to join the leader after 5 attempts. The follower will start new a leader election, but that will not result in a new leader election round at the leader. The leader will wait for 40 seconds before giving up leadership. In the meantime, the follower will complete the second round of election, elect the same leader and again try to join the leader. I would expect that this process should complete soon (at least within 40 seconds of leader wait time). Without logs it will be hard to pinpoint the problem. Can you also take a stack trace at the leader and the follower? Can you give details about your experiment (you mentioned you also replaced a node). Also, what is your SyncLimit? There have been some fixes in the code path in question ( ZOOKEEPER-822 and ZOOKEEPER-900 ). Higher syncLimit values or failure of a node during leader election can cause significant delays without these fixes. Note that there have been many fixes in leader election since ZK 3.2.2. We are committed to zookeeper 3.2.2 for the moment, since we do not have sufficient resources to test a later zookeeper in time for inclusion in our next product release. We will be implementing other workarounds for the moment, but it seemed like a good idea to raise the issue and see whether this is the desired behavior.
        Hide
        mahadev Mahadev konar added a comment -

        not a blocker. Moving it out of 3.4 release.

        Show
        mahadev Mahadev konar added a comment - not a blocker. Moving it out of 3.4 release.

          People

          • Assignee:
            Unassigned
            Reporter:
            ahardy Alexandre Hardy
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development