Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-2555

zookeeper started in observer mode takes long time to come to observer state once leader restarted and so client connected to observer mode has to wait for longer time to get service

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.5.1
    • None
    • server
    • None

    Description

      zookeeper started in observer mode takes long time (some times 25 seconds) to come to observer state once leader restarted and so client connected to observer mode has to wait for longer time to get service

      Steps to reproduce:-
      1. Start zookeeper in cluster mode in which one node is in observer mode
      2. stop the leader node (some times we need to wait for 30 secs to reproduce this issue)
      3. Start the leader node
      4. Check the observer node status -
      It will be in
      "Error contacting service. It is probably not running."

      and takes long time (25 secs) to come to observer mode. And hence client connected to this node will not get service during this time.
      Log at observer node is as below:-

      2016-09-06 17:49:14,774 [myid:2] - WARN [WorkerSender[myid=2]:QuorumCnxManager@459] - Cannot open channel to 3 at election address /10.18.221.194:3888
      java.net.ConnectException: Connection refused
      at java.net.PlainSocketImpl.socketConnect(Native Method)
      at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
      at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
      at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
      at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
      at java.net.Socket.connect(Socket.java:579)
      at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:444)
      at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:485)
      at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:421)
      at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:486)
      at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:465)
      at java.lang.Thread.run(Thread.java:722)
      2016-09-06 17:49:14,776 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@688] - Notification: 2 (message format version), 3 (n.leader), 0x2a00000001 (n.zxid), 0x7 (n.round), LOOKING (n.state), 1 (n.sid), 0x2b (n.peerEPoch), LOOKING (my state)100000000 (n.config version)
      2016-09-06 17:49:40,377 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):FastLeaderElection@928] - Notification time out: 51200

      2016-09-06 17:49:40,378 [myid:2] - INFO [WorkerSender[myid=2]:QuorumCnxManager@278] - Have smaller server identifier, so dropping the connection: (3, 2)
      2016-09-06 17:49:40,379 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@688] - Notification: 2 (message format version), 3 (n.leader), 0x2a00000001 (n.zxid), 0x7 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x2c (n.peerEPoch), LOOKING (my state)100000000 (n.config version)
      2016-09-06 17:49:40,381 [myid:2] - INFO [/10.18.219.50:3888:QuorumCnxManager$Listener@637] - Received connection request /10.18.221.194:34085
      2016-09-06 17:49:40,388 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@688] - Notification: 2 (message format version), 3 (n.leader), 0x2a00000001 (n.zxid), 0x7 (n.round), LEADING (n.state), 3 (n.sid), 0x2c (n.peerEPoch), LOOKING (my state)100000000 (n.config version)
      2016-09-06 17:49:40,388 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):MBeanRegistry@119] - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id2,name1=replica.2,name2=LeaderElection]
      2016-09-06 17:49:40,389 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):QuorumPeer@1049] - OBSERVING
      2016-09-06 17:49:40,389 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ZooKeeperServer@858] - minSessionTimeout set to 4000
      2016-09-06 17:49:40,389 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ZooKeeperServer@867] - maxSessionTimeout set to 40000
      2016-09-06 17:49:40,389 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ZooKeeperServer@156] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/data/version-2 snapdir /home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/data/version-2
      2016-09-06 17:49:40,389 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):ObserverZooKeeperServer@56] - syncEnabled =true
      2016-09-06 17:49:40,389 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Observer@72] - Observing /10.18.221.194:2888
      2016-09-06 17:49:40,396 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):FileSnap@83] - Reading snapshot /home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/data/version-2/snapshot.2a00000001
      2016-09-06 17:49:40,410 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@369] - Getting a snapshot from leader
      2016-09-06 17:49:40,411 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@509] - Learner received NEWLEADER message
      2016-09-06 17:49:40,411 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):FileTxnSnapLog@298] - Snapshotting: 0x2c00000000 to /home/Rakesh/Zookeeper/18_Aug/cluster/zookeeper-3.5.1-alpha/data/version-2/snapshot.2c00000000
      2016-09-06 17:49:40,417 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Learner@493] - Learner received UPTODATE message
      2016-09-06 17:49:40,417 [myid:2] - INFO [QuorumPeer[myid=2](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):CommitProcessor@254] - Configuring CommitProcessor with 8 worker threads.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rakeshsingh Rakesh Kumar Singh
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: