Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Not A Problem
    • Affects Version/s: 3.3.6, 3.4.5
    • Fix Version/s: 3.4.6
    • Component/s: quorum
    • Labels:
      None
    • Release Note:
      Hide
      During a rolling upgrade from the 3.3 branch to the 3.4 branch, a 3.3 server won't be able to follow a 3.4, so if there is an election during the upgrade and the new leader is a 3.4 server, then the 3.3 server will be unavailable until it is upgraded. If a 3.3 server leads during the upgrade process and it is the last one to be upgraded, then no problem should be observed.
      Show
      During a rolling upgrade from the 3.3 branch to the 3.4 branch, a 3.3 server won't be able to follow a 3.4, so if there is an election during the upgrade and the new leader is a 3.4 server, then the 3.3 server will be unavailable until it is upgraded. If a 3.3 server leads during the upgrade process and it is the last one to be upgraded, then no problem should be observed.

      Description

      When a 3.3 server attempts to join an existing quorum lead by a 3.4 server, the 3.3 server is disconnected while trying to download the leader's snapshot. The 3.3 server restarts and starts the process over again, but is never able to join the quorum.

      3.3 server log:

      2012-12-07 10:44:34,582 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Learner@294] - Getting a snapshot from leader
      2012-12-07 10:44:34,582 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Learner@325] - Setting leader epoch 12
      2012-12-07 10:44:54,604 - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Follower@82] - Exception when following the leader
      java.io.EOFException
              at java.io.DataInputStream.readInt(DataInputStream.java:392)
              at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
              at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
              at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
              at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
              at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:332)
              at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:75)
              at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
      2012-12-07 10:44:54,605 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Follower@165] - shutdown called
      java.lang.Exception: shutdown Follower
              at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
              at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
      

      3.4 leader log:

      2012-12-07 10:51:35,178 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@273] - Backward compatibility mode, server id=3
      2012-12-07 10:51:35,178 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 3 (n.leader), 0x1100000000 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0x11 (n.peerEPoch), LEADING (my state)
      2012-12-07 10:51:35,182 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:37654:LearnerHandler@263] - Follower sid: 3 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@262f4873
      2012-12-07 10:51:35,182 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:37654:LearnerHandler@318] - Synchronizing with Follower sid: 3 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x1100000000
      2012-12-07 10:51:35,182 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:37654:LearnerHandler@395] - Sending SNAP
      2012-12-07 10:51:35,183 [myid:2] - INFO  [LearnerHandler-/127.0.0.1:37654:LearnerHandler@419] - Sending snapshot last zxid of peer is 0x1100000000  zxid of leader is 0x1200000000sent zxid of db as 0x1200000000
      2012-12-07 10:51:55,204 [myid:2] - ERROR [LearnerHandler-/127.0.0.1:37654:LearnerHandler@562] - Unexpected exception causing shutdown while sock still open
      java.net.SocketTimeoutException: Read timed out
              at java.net.SocketInputStream.socketRead0(Native Method)
              at java.net.SocketInputStream.read(SocketInputStream.java:150)
              at java.net.SocketInputStream.read(SocketInputStream.java:121)
              at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
              at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
              at java.io.DataInputStream.readInt(DataInputStream.java:387)
              at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
              at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
              at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
              at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:450)
      2012-12-07 10:51:55,205 [myid:2] - WARN  [LearnerHandler-/127.0.0.1:37654:LearnerHandler@575] - ******* GOODBYE /127.0.0.1:37654 ********
      

        Attachments

        1. ZOOKEEPER-1599.patch
          2 kB
          Skye Wanderman-Milne

          Activity

            People

            • Assignee:
              skye Skye Wanderman-Milne
              Reporter:
              skye Skye Wanderman-Milne
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: