Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-2172

Cluster crashes when reconfig a new node as a participant

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.5.0
    • Fix Version/s: 3.5.3, 3.6.0
    • Component/s: leaderElection, quorum, server
    • Labels:
      None
    • Environment:

      Ubuntu 12.04 + java 7

    • Hadoop Flags:
      Reviewed

      Description

      The operations are quite simple: start three zk servers one by one, then reconfig the cluster to add the new one as a participant. When I add the third one, the zk cluster may enter a weird state and cannot recover.

      I found “2015-04-20 12:53:48,236 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. So the first node received the reconfig cmd at 12:53:48. Latter, it logged “2015-04-20 12:53:52,230 [myid:1] - ERROR [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] - WARN [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - ******* GOODBYE /10.0.0.2:55890 ********”. From then on, the first node and second node rejected all client connections and the third node didn’t join the cluster as a participant. The whole cluster was done.

      When the problem happened, all three nodes just used the same dynamic config file zoo.cfg.dynamic.10000005d which only contained the first two nodes. But there was another unused dynamic config file in node-1 directory zoo.cfg.dynamic.next which already contained three nodes.

      When I extended the waiting time between starting the third node and reconfiguring the cluster, the problem didn’t show again. So it should be a race condition problem.

      1. history.txt
        63 kB
        Hitoshi Mitake
      2. node-1.log
        171 kB
        Ziyou Wang
      3. node-2.log
        108 kB
        Ziyou Wang
      4. node-3.log
        20 kB
        Ziyou Wang
      5. zoo.cfg.dynamic.10000005d
        0.2 kB
        Ziyou Wang
      6. zoo.cfg.dynamic.next
        0.2 kB
        Ziyou Wang
      7. zoo-1.log
        1.55 MB
        Ziyou Wang
      8. zoo-2.log
        495 kB
        Ziyou Wang
      9. zoo-2-1.log
        1.60 MB
        Ziyou Wang
      10. zoo-2-2.log
        435 kB
        Ziyou Wang
      11. zoo-2212-1.log
        3.92 MB
        Ziyou Wang
      12. zoo-2212-2.log
        3.10 MB
        Ziyou Wang
      13. zoo-2212-3.log
        120 kB
        Ziyou Wang
      14. zoo-2-3.log
        60 kB
        Ziyou Wang
      15. zoo-3.log
        57 kB
        Ziyou Wang
      16. zoo-3-1.log
        1.76 MB
        Ziyou Wang
      17. zoo-3-2.log
        860 kB
        Ziyou Wang
      18. zoo-3-3.log
        68 kB
        Ziyou Wang
      19. zoo-4-1.log
        1.39 MB
        Ziyou Wang
      20. zoo-4-2.log
        440 kB
        Ziyou Wang
      21. zoo-4-3.log
        60 kB
        Ziyou Wang
      22. zookeeper-1.log
        1.31 MB
        Ziyou Wang
      23. zookeeper-1.out
        51 kB
        Hitoshi Mitake
      24. zookeeper-2.log
        253 kB
        Ziyou Wang
      25. zookeeper-2.out
        33 kB
        Hitoshi Mitake
      26. ZOOKEEPER-2172.patch
        0.7 kB
        Hitoshi Mitake
      27. ZOOKEEPER-2172-02.patch
        2 kB
        Mohammad Arshad
      28. ZOOKEEPER-2172-03.patch
        12 kB
        Mohammad Arshad
      29. ZOOKEEPER-2172-04.patch
        13 kB
        Mohammad Arshad
      30. ZOOKEEPER-2172-06.patch
        14 kB
        Alexander Shraer
      31. ZOOKEEPER-2172-07.patch
        13 kB
        Mohammad Arshad
      32. zookeeper-3.log
        62 kB
        Ziyou Wang
      33. zookeeper-3.out
        59 kB
        Hitoshi Mitake
      34. ZOOKEPER-2172-05.patch
        16 kB
        Alexander Shraer

        Issue Links

          Activity

          Hide
          ziyouw Ziyou Wang added a comment -

          Error log when this bug happened.

          Show
          ziyouw Ziyou Wang added a comment - Error log when this bug happened.
          Hide
          michim Michi Mutsuzaki added a comment -

          Ziyou Wang could you also post the initial configuration files for all the nodes? Alexander Shraer could you take a look at these log files when you get a chance?

          Show
          michim Michi Mutsuzaki added a comment - Ziyou Wang could you also post the initial configuration files for all the nodes? Alexander Shraer could you take a look at these log files when you get a chance?
          Hide
          shralex Alexander Shraer added a comment -

          Thanks Michi for the pointer, I'll take a look, but may not have time today.

          In general the .next file is created when a server acks a reconfig request and deleted when it commits the request.
          We check that a quorum is synced before starting the reconfig but once someone knows about a reconfig proposal and can talk to a quorum of nodes, there is no going back – if a quorum of the new config isn't available the system will get stuck (just like without reconfig the system will get stuck if there's no quorum).

          Show
          shralex Alexander Shraer added a comment - Thanks Michi for the pointer, I'll take a look, but may not have time today. In general the .next file is created when a server acks a reconfig request and deleted when it commits the request. We check that a quorum is synced before starting the reconfig but once someone knows about a reconfig proposal and can talk to a quorum of nodes, there is no going back – if a quorum of the new config isn't available the system will get stuck (just like without reconfig the system will get stuck if there's no quorum).
          Hide
          michim Michi Mutsuzaki added a comment -

          Would it be possible that this is hitting the case described in http://zookeeper.apache.org/doc/trunk/zookeeperReconfig.html#sc_reconfig_general :

          Finally, note that once connected to the leader, a joiner adopts the last committed configuration, in which it is absent (the initial config of the joiner is backed up before being rewritten). If the joiner restarts in this state, it will not be able to boot since it is absent from its configuration file. In order to start it you’ll once again have to specify an initial configuration.

          Show
          michim Michi Mutsuzaki added a comment - Would it be possible that this is hitting the case described in http://zookeeper.apache.org/doc/trunk/zookeeperReconfig.html#sc_reconfig_general : Finally, note that once connected to the leader, a joiner adopts the last committed configuration, in which it is absent (the initial config of the joiner is backed up before being rewritten). If the joiner restarts in this state, it will not be able to boot since it is absent from its configuration file. In order to start it you’ll once again have to specify an initial configuration.
          Hide
          shralex Alexander Shraer added a comment -

          quite possible. the question is why weren't 1 and 2 enough to form a quorum. 3 doesn't need to be up if both 1 and 2 are participants in the last committed config.

          Show
          shralex Alexander Shraer added a comment - quite possible. the question is why weren't 1 and 2 enough to form a quorum. 3 doesn't need to be up if both 1 and 2 are participants in the last committed config.
          Hide
          shralex Alexander Shraer added a comment -

          looks like the config here is using the patch of ZOOKEEPER-2031

          Show
          shralex Alexander Shraer added a comment - looks like the config here is using the patch of ZOOKEEPER-2031
          Hide
          michim Michi Mutsuzaki added a comment -

          node1 doesn't seem to receive the vote from itself. it receives votes from node2 and node3:

          node-1.log:2015-04-20 12:55:03,358 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@698] - Notification: 2 (message format version), 2 (n.leader), 0x100000084 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x1 (n.peerEPoch), LEADING (my state)10000005d (n.config version)
          node-1.log:2015-04-20 12:55:51,547 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@698] - Notification: 2 (message format version), 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x1 (n.peerEPoch), LEADING (my state)10000005d (n.config version)
          

          node2 receives votes from node1 and itself:

          node-2.log:2015-04-20 12:55:03,361 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@698] - Notification: 2 (message format version), 1 (n.leader), 0x0 (n.zxid), 0xffffffffffffffff (n.round), LEADING (n.state), 1 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)10000005d (n.config version)
          node-2.log:2015-04-20 12:55:54,564 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@698] - Notification: 2 (message format version), 2 (n.leader), 0x100000084 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)10000005d (n.config version)
          

          Is node3's vote somehow confusing node1?

          Yes, I think this cluster is using the patch from ZOOKEEPER-2031. Do you think that might be related to this issue?

          Show
          michim Michi Mutsuzaki added a comment - node1 doesn't seem to receive the vote from itself. it receives votes from node2 and node3: node-1.log:2015-04-20 12:55:03,358 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@698] - Notification: 2 (message format version), 2 (n.leader), 0x100000084 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x1 (n.peerEPoch), LEADING (my state)10000005d (n.config version) node-1.log:2015-04-20 12:55:51,547 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@698] - Notification: 2 (message format version), 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x1 (n.peerEPoch), LEADING (my state)10000005d (n.config version) node2 receives votes from node1 and itself: node-2.log:2015-04-20 12:55:03,361 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@698] - Notification: 2 (message format version), 1 (n.leader), 0x0 (n.zxid), 0xffffffffffffffff (n.round), LEADING (n.state), 1 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)10000005d (n.config version) node-2.log:2015-04-20 12:55:54,564 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@698] - Notification: 2 (message format version), 2 (n.leader), 0x100000084 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)10000005d (n.config version) Is node3's vote somehow confusing node1? Yes, I think this cluster is using the patch from ZOOKEEPER-2031 . Do you think that might be related to this issue?
          Hide
          shralex Alexander Shraer added a comment -

          Server 1 continues to be leading till the end of the execution. It is able to push its version to server 3 but for some reason its leader election messages have round 0xffffffffffffffff so I think this is why servers 2 and 3 don't adopt it as leader. It also doesnt timeout for some reason.

          Flavio Junqueira, this looks related to ZOOKEEPER-1732 and ZOOKEEPER-1805, any thoughts ?

          Unlike what the description says, the .next file provided here is identical to the other config file and contains a config with servers 1 and 2, so it probably resulted from the reconfiguration adding server 2. Which server is this coming from ? server 2 probably ? That may happen if server 1 committed the reconfig but server 2 hasn't learned the commit yet (but its other config file has to be different in this case).

          It would be helpful if you can reproduce the scenario without using the ZK-2031 patch and provide all the config files, including the initial ones, from the servers and all the reconfig commands you run and when.

          I can see from the logs that there were many attempts to reconfigure (probably add server 2) before it was synced with server 1 so they failed, which is normal. Then a reconfig succeeds at time 12:51:48, then more reconfig commands are invoked (like minute 12:51:56), which is before server 3 even starts. Is this intentional ? What do these commands attempt to do ?

          Show
          shralex Alexander Shraer added a comment - Server 1 continues to be leading till the end of the execution. It is able to push its version to server 3 but for some reason its leader election messages have round 0xffffffffffffffff so I think this is why servers 2 and 3 don't adopt it as leader. It also doesnt timeout for some reason. Flavio Junqueira , this looks related to ZOOKEEPER-1732 and ZOOKEEPER-1805 , any thoughts ? Unlike what the description says, the .next file provided here is identical to the other config file and contains a config with servers 1 and 2, so it probably resulted from the reconfiguration adding server 2. Which server is this coming from ? server 2 probably ? That may happen if server 1 committed the reconfig but server 2 hasn't learned the commit yet (but its other config file has to be different in this case). It would be helpful if you can reproduce the scenario without using the ZK-2031 patch and provide all the config files, including the initial ones, from the servers and all the reconfig commands you run and when. I can see from the logs that there were many attempts to reconfigure (probably add server 2) before it was synced with server 1 so they failed, which is normal. Then a reconfig succeeds at time 12:51:48, then more reconfig commands are invoked (like minute 12:51:56), which is before server 3 even starts. Is this intentional ? What do these commands attempt to do ?
          Hide
          michim Michi Mutsuzaki added a comment -

          Ziyou Wang would it be possible to reproduce it without zk-2031? If not, could you try reproducing this with debug log enabled? Thanks!

          Show
          michim Michi Mutsuzaki added a comment - Ziyou Wang would it be possible to reproduce it without zk-2031? If not, could you try reproducing this with debug log enabled? Thanks!
          Hide
          ziyouw Ziyou Wang added a comment -

          Hi Alexander and Michi,

          Thanks a lot for looking at this bug. The zk-2031 is developed as a part of our system. So I cannot remove this patch away to reproduce this problem. I will try to reproduce this problem with debug log enabled.

          BTW. we hit this problems many time after reporting this issue. If we move to a slower environment, e.g. a nested host, it need more time between start and reconfig to avoid it. So I suspect it may be caused when the node 3 doesn't finish sync latest data from leader, the reconfig comes and breaks some thing. Do you have some idea about this?

          For Alexander's quesitons, our system had a bug in such case when reporting this jira. It just retried to do the reconfig to add node2 even the node 2 is already in the list. I already fixed it, but we still hit this jira problem again now.

          Show
          ziyouw Ziyou Wang added a comment - Hi Alexander and Michi, Thanks a lot for looking at this bug. The zk-2031 is developed as a part of our system. So I cannot remove this patch away to reproduce this problem. I will try to reproduce this problem with debug log enabled. BTW. we hit this problems many time after reporting this issue. If we move to a slower environment, e.g. a nested host, it need more time between start and reconfig to avoid it. So I suspect it may be caused when the node 3 doesn't finish sync latest data from leader, the reconfig comes and breaks some thing. Do you have some idea about this? For Alexander's quesitons, our system had a bug in such case when reporting this jira. It just retried to do the reconfig to add node2 even the node 2 is already in the list. I already fixed it, but we still hit this jira problem again now.
          Hide
          ziyouw Ziyou Wang added a comment -

          I open the zookeeper debug log and reproduce the problem again.

          Show
          ziyouw Ziyou Wang added a comment - I open the zookeeper debug log and reproduce the problem again.
          Hide
          ziyouw Ziyou Wang added a comment -

          No, it is just a bug in previous system. It just will try to reconfig the second node even it already join the cluster. So it also just use "reconfig -add" to add the second node to the participant list.

          Show
          ziyouw Ziyou Wang added a comment - No, it is just a bug in previous system. It just will try to reconfig the second node even it already join the cluster. So it also just use "reconfig -add" to add the second node to the participant list.
          Hide
          ziyouw Ziyou Wang added a comment -

          No, it is just a bug in previous system. It just will try to reconfig the second node even it already join the cluster. So it also just use "reconfig -add" to add the second node to the participant list.

          Show
          ziyouw Ziyou Wang added a comment - No, it is just a bug in previous system. It just will try to reconfig the second node even it already join the cluster. So it also just use "reconfig -add" to add the second node to the participant list.
          Hide
          ziyouw Ziyou Wang added a comment -

          Hi Michi Mutsuzaki and A. M.,

          I opened the debug log for zookeeper and reproduce this problem. Could you help to check the problem again? Thanks a lot.

          Show
          ziyouw Ziyou Wang added a comment - Hi Michi Mutsuzaki and A. M. , I opened the debug log for zookeeper and reproduce this problem. Could you help to check the problem again? Thanks a lot.
          Hide
          michim Michi Mutsuzaki added a comment -

          Thanks Ziyou.

          Flavio Junqueira Alexander Shraer it looks like node1 and node2 are not forming a quorum because node2 has seen zxid 0x100000059 but node1 keeps sending 0x0 as its zxid. Isn't node1 supposed send the highest zxid it has seen?

          From zookeeper-1.log:

          2015-05-25 12:34:36,920 [myid:1] - DEBUG [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@423] - Sending new notification. My id =1 recipient=2 zxid=0x0 leader=1 config version = 100000049
          2015-05-25 12:34:39,090 [myid:1] - DEBUG [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@423] - Sending new notification. My id =1 recipient=3 zxid=0x0 leader=1 config version = 100000049
          2015-05-25 12:35:28,128 [myid:1] - DEBUG [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@423] - Sending new notification. My id =1 recipient=2 zxid=0x0 leader=1 config version = 100000049
          2015-05-25 12:35:30,301 [myid:1] - DEBUG [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@423] - Sending new notification. My id =1 recipient=3 zxid=0x0 leader=1 config version = 100000049
          

          From zookeeper-2.log:

          2015-05-25 12:34:36,918 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@698] - Notification: 2 (message format version), 2 (n.leader), 0x100000059 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)100000049 (n.config version)
          2015-05-25 12:34:36,923 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@698] - Notification: 2 (message format version), 1 (n.leader), 0x0 (n.zxid), 0xffffffffffffffff (n.round), LEADING (n.state), 1 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)100000049 (n.config version)
          2015-05-25 12:35:28,124 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.0.0.2:1300:FastLeaderElection@688] - Sending Notification: 2 (n.leader), 0x100000059 (n.zxid), 0x1 (n.round), 1 (recipient), 2 (myid), 0x1 (n.peerEpoch)
          2015-05-25 12:35:28,125 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.0.0.2:1300:FastLeaderElection@688] - Sending Notification: 2 (n.leader), 0x100000059 (n.zxid), 0x1 (n.round), 2 (recipient), 2 (myid), 0x1 (n.peerEpoch)
          
          Show
          michim Michi Mutsuzaki added a comment - Thanks Ziyou. Flavio Junqueira Alexander Shraer it looks like node1 and node2 are not forming a quorum because node2 has seen zxid 0x100000059 but node1 keeps sending 0x0 as its zxid. Isn't node1 supposed send the highest zxid it has seen? From zookeeper-1.log: 2015-05-25 12:34:36,920 [myid:1] - DEBUG [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@423] - Sending new notification. My id =1 recipient=2 zxid=0x0 leader=1 config version = 100000049 2015-05-25 12:34:39,090 [myid:1] - DEBUG [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@423] - Sending new notification. My id =1 recipient=3 zxid=0x0 leader=1 config version = 100000049 2015-05-25 12:35:28,128 [myid:1] - DEBUG [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@423] - Sending new notification. My id =1 recipient=2 zxid=0x0 leader=1 config version = 100000049 2015-05-25 12:35:30,301 [myid:1] - DEBUG [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@423] - Sending new notification. My id =1 recipient=3 zxid=0x0 leader=1 config version = 100000049 From zookeeper-2.log: 2015-05-25 12:34:36,918 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@698] - Notification: 2 (message format version), 2 (n.leader), 0x100000059 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)100000049 (n.config version) 2015-05-25 12:34:36,923 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@698] - Notification: 2 (message format version), 1 (n.leader), 0x0 (n.zxid), 0xffffffffffffffff (n.round), LEADING (n.state), 1 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)100000049 (n.config version) 2015-05-25 12:35:28,124 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.0.0.2:1300:FastLeaderElection@688] - Sending Notification: 2 (n.leader), 0x100000059 (n.zxid), 0x1 (n.round), 1 (recipient), 2 (myid), 0x1 (n.peerEpoch) 2015-05-25 12:35:28,125 [myid:2] - DEBUG [QuorumPeer[myid=2]/10.0.0.2:1300:FastLeaderElection@688] - Sending Notification: 2 (n.leader), 0x100000059 (n.zxid), 0x1 (n.round), 2 (recipient), 2 (myid), 0x1 (n.peerEpoch)
          Hide
          shralex Alexander Shraer added a comment -

          Michi, I agree that its weird, and again I see this 0xffffffffffffffff round number which I think causes the other server to ignore the leader's messages and continue looking even though server 1 is leading (without timing out for some reason). This looks related to ZOOKEEPER-1732 and ZOOKEEPER-1805. Flavio Junqueira could you take a look ?

          It also seems like there are a lot of client sessions being established and destroyed (clients connect and disconnect).
          And in particular when the reconfig adding server 3 happens (12:33:21,797 on server 1) the client session 0x14d8b08424a0014 (this is the client that submitted the reconfig) gets closed in the middle of the operation. Then, the connection to server 2 is suddenly closed with the error (on server 2)

          2015-05-25 12:33:25,786 [myid:2] - WARN [QuorumPeer[myid=2]/10.0.0.2:1300:Follower@92] - Exception when following the leader java.net.SocketTimeoutException: Read timed out

          Could it be that the termination of a client session in the middle of an op messes up server-to-server connections ?

          Show
          shralex Alexander Shraer added a comment - Michi, I agree that its weird, and again I see this 0xffffffffffffffff round number which I think causes the other server to ignore the leader's messages and continue looking even though server 1 is leading (without timing out for some reason). This looks related to ZOOKEEPER-1732 and ZOOKEEPER-1805 . Flavio Junqueira could you take a look ? It also seems like there are a lot of client sessions being established and destroyed (clients connect and disconnect). And in particular when the reconfig adding server 3 happens (12:33:21,797 on server 1) the client session 0x14d8b08424a0014 (this is the client that submitted the reconfig) gets closed in the middle of the operation. Then, the connection to server 2 is suddenly closed with the error (on server 2) 2015-05-25 12:33:25,786 [myid:2] - WARN [QuorumPeer [myid=2] /10.0.0.2:1300:Follower@92] - Exception when following the leader java.net.SocketTimeoutException: Read timed out Could it be that the termination of a client session in the middle of an op messes up server-to-server connections ?
          Hide
          ziyouw Ziyou Wang added a comment -

          The reason why there are many clients connect and disconnect is we use zk cli to do the operations. So the cli just create a client, do the operation and shutdown it.

          From our tests, I think it should be a race condition problem and may not have relationship with the cli. When the cluster is running in a faster environment, we just need to wait a small interval between starting the node3 and reconfiguring it to join cluster, then we almost can avoid this problem. But when we move to a slower environment, it can be reproduced consistently again. So the temp solution is extending the waiting time and we can avoid it again.

          Show
          ziyouw Ziyou Wang added a comment - The reason why there are many clients connect and disconnect is we use zk cli to do the operations. So the cli just create a client, do the operation and shutdown it. From our tests, I think it should be a race condition problem and may not have relationship with the cli. When the cluster is running in a faster environment, we just need to wait a small interval between starting the node3 and reconfiguring it to join cluster, then we almost can avoid this problem. But when we move to a slower environment, it can be reproduced consistently again. So the temp solution is extending the waiting time and we can avoid it again.
          Hide
          shralex Alexander Shraer added a comment -

          when you run in the slower environment does server 2 still disconnect like in the logs when you add server 3 ?

          Can you please provide your initial configs for all 3 servers ? and the reconfig commands you run ? I'd like to try to reproduce this.

          Show
          shralex Alexander Shraer added a comment - when you run in the slower environment does server 2 still disconnect like in the logs when you add server 3 ? Can you please provide your initial configs for all 3 servers ? and the reconfig commands you run ? I'd like to try to reproduce this.
          Hide
          ziyouw Ziyou Wang added a comment -

          Yes, it has the same problem.

          The step is:
          1. Start server 1, its zoo.cfg contains:
          standaloneEnabled=false
          syncLimit=2
          tickTime=2000
          initLimit=5

          ant its dynamic config file just contains itself.

          2. Start server 2, its zoo.cfg is the same and its init dynamic config file contains 1 and itself. Then reconfig itself as a participant after it sync data from 1 (need to try a few time until the reconfig can be finished).

          3. Start server 3, use the same zoo.cfg and its init dynamic config file contains 1, 2 and itself. Then reconfig itself as the third participant (where we hit the bug).

          Since it is a race condition, it may be not easy to be reproduced in your environment. In my dev, the zk server is running in a docker-in-VM
          environment. If I change the script to just wait 5 secs after starting the server 3, the problem can be reproduced every time.

          Show
          ziyouw Ziyou Wang added a comment - Yes, it has the same problem. The step is: 1. Start server 1, its zoo.cfg contains: standaloneEnabled=false syncLimit=2 tickTime=2000 initLimit=5 ant its dynamic config file just contains itself. 2. Start server 2, its zoo.cfg is the same and its init dynamic config file contains 1 and itself. Then reconfig itself as a participant after it sync data from 1 (need to try a few time until the reconfig can be finished). 3. Start server 3, use the same zoo.cfg and its init dynamic config file contains 1, 2 and itself. Then reconfig itself as the third participant (where we hit the bug). Since it is a race condition, it may be not easy to be reproduced in your environment. In my dev, the zk server is running in a docker-in-VM environment. If I change the script to just wait 5 secs after starting the server 3, the problem can be reproduced every time.
          Hide
          shralex Alexander Shraer added a comment -

          Thanks. The basic scenario you described works for me, but you're right of course - there may be a race condition.

          It would be great if you could help investigate this further. For example, why does server 2 disconnect when you add server 3 ? BTW, server 3 doesn't even have to be up in order for you to add it to the ensemble since you have 2 out of 3 servers without it.

          I also wonder if the problem has anything to do with the client shutting off right after – does it happen if you don't close the connection on the client ?

          Show
          shralex Alexander Shraer added a comment - Thanks. The basic scenario you described works for me, but you're right of course - there may be a race condition. It would be great if you could help investigate this further. For example, why does server 2 disconnect when you add server 3 ? BTW, server 3 doesn't even have to be up in order for you to add it to the ensemble since you have 2 out of 3 servers without it. I also wonder if the problem has anything to do with the client shutting off right after – does it happen if you don't close the connection on the client ?
          Hide
          ziyouw Ziyou Wang added a comment -

          BTW, all the reconfigs are sent to server1 to finish.

          I can check the problem, but I am not familiar with the state machine in zookeeper. And you are right, we can config server 3 even before it starts. But we need to deploy a cluster automatically, it is weird to config server 2 after it starts and config server 3 before it starts.

          As I mentioned, we just use CLI to do the reconfig. So I need to change the cli to test this. I think the client should already get the result from the server before it shutdown.

          Show
          ziyouw Ziyou Wang added a comment - BTW, all the reconfigs are sent to server1 to finish. I can check the problem, but I am not familiar with the state machine in zookeeper. And you are right, we can config server 3 even before it starts. But we need to deploy a cluster automatically, it is weird to config server 2 after it starts and config server 3 before it starts. As I mentioned, we just use CLI to do the reconfig. So I need to change the cli to test this. I think the client should already get the result from the server before it shutdown.
          Hide
          shralex Alexander Shraer added a comment -

          I was just suggesting this to see if we can minimize the scenario causing the race condition. For example what happens when you don't start server 3 at all. I also really doubt that the problem is at the client. ZK should work whether or not the client disconnects in the middle of an operation. I'm just trying to think how we could track down the bug.

          I'd try to understand whether the client's shut down is somehow not handled well in the server.
          Could you see if the problem is still there when you a) not shutting down the client after reconfig ? if the problem goes away can you try b) have a write after the reconfig and shut it down the client only after the write completes ?

          Show
          shralex Alexander Shraer added a comment - I was just suggesting this to see if we can minimize the scenario causing the race condition. For example what happens when you don't start server 3 at all. I also really doubt that the problem is at the client. ZK should work whether or not the client disconnects in the middle of an operation. I'm just trying to think how we could track down the bug. I'd try to understand whether the client's shut down is somehow not handled well in the server. Could you see if the problem is still there when you a) not shutting down the client after reconfig ? if the problem goes away can you try b) have a write after the reconfig and shut it down the client only after the write completes ?
          Hide
          ziyouw Ziyou Wang added a comment -

          OK, I see. I will try this.

          For debug purpose, do you want to add more log to related code? You can point out the places and I can add them. Right now, even with debug log, I still don't see much important log in the bug moment.

          For the server 2 disconnect problem, since all the servers are running in one os (different containers), they use exactly the same system clock. The server 2 got a read timeout first and the server 1 shutdown the connection latter. So I think the server 1's state machine may enter some error state first.

          Show
          ziyouw Ziyou Wang added a comment - OK, I see. I will try this. For debug purpose, do you want to add more log to related code? You can point out the places and I can add them. Right now, even with debug log, I still don't see much important log in the bug moment. For the server 2 disconnect problem, since all the servers are running in one os (different containers), they use exactly the same system clock. The server 2 got a read timeout first and the server 1 shutdown the connection latter. So I think the server 1's state machine may enter some error state first.
          Hide
          shralex Alexander Shraer added a comment -

          I'm not sure where to add the debug messages yet, but happy to point you to the relevant places once we determine where they are. Is it possible that servers 2 and 3 are using the same resources somehow ? Ports, data directory or something else ? this may explain why one joining messes up the connection to the other.

          Show
          shralex Alexander Shraer added a comment - I'm not sure where to add the debug messages yet, but happy to point you to the relevant places once we determine where they are. Is it possible that servers 2 and 3 are using the same resources somehow ? Ports, data directory or something else ? this may explain why one joining messes up the connection to the other.
          Hide
          ziyouw Ziyou Wang added a comment -

          No, they are in totally independent docker containers (then isolate ip, port, file system), so they don't share any resources directly.

          I changed the client issue, but the problem is still there. I also tried the same reconfig steps in a faster environment (outside of docker container). It cannot be reproduced any more. So the slow environment just extends the bug window and makes it can be reproduced consistently.

          Show
          ziyouw Ziyou Wang added a comment - No, they are in totally independent docker containers (then isolate ip, port, file system), so they don't share any resources directly. I changed the client issue, but the problem is still there. I also tried the same reconfig steps in a faster environment (outside of docker container). It cannot be reproduced any more. So the slow environment just extends the bug window and makes it can be reproduced consistently.
          Hide
          shralex Alexander Shraer added a comment -

          can you post the logs from the run you mention where the client doesn't disconnect ?

          Show
          shralex Alexander Shraer added a comment - can you post the logs from the run you mention where the client doesn't disconnect ?
          Hide
          fpj Flavio Junqueira added a comment -

          There are a few really weird things here. Check these notifications:

          Notification: 2 (message format version), -9223372036854775808 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x0 (n.peerEPoch), LEADING (my state)100000049 (n.config version)
          

          I checked the logs of 3 and does sound like it sent this notification.

          Sending Notification: -9223372036854775808 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 1 (recipient), 3 (myid), 0x0 (n.peerEpoch)
          

          The initialization of leader election here doesn't sound right. And, as Alexander Shraer has pointed out, 2 and 3 apparently received notifications with 0xffffffffffffffff as the round of the sender.

          Notification: 2 (message format version), 1 (n.leader), 0x0 (n.zxid), 0xffffffffffffffff (n.round), LEADING (n.state), 1 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)100000049 (n.config version)
          

          I found no evidence in the log of 1 that it has actually set or sent such a value.

          The values I'm seeing in the notification across logs look a bit strange.

          Show
          fpj Flavio Junqueira added a comment - There are a few really weird things here. Check these notifications: Notification: 2 (message format version), -9223372036854775808 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x0 (n.peerEPoch), LEADING (my state)100000049 (n.config version) I checked the logs of 3 and does sound like it sent this notification. Sending Notification: -9223372036854775808 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 1 (recipient), 3 (myid), 0x0 (n.peerEpoch) The initialization of leader election here doesn't sound right. And, as Alexander Shraer has pointed out, 2 and 3 apparently received notifications with 0xffffffffffffffff as the round of the sender. Notification: 2 (message format version), 1 (n.leader), 0x0 (n.zxid), 0xffffffffffffffff (n.round), LEADING (n.state), 1 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)100000049 (n.config version) I found no evidence in the log of 1 that it has actually set or sent such a value. The values I'm seeing in the notification across logs look a bit strange.
          Hide
          michim Michi Mutsuzaki added a comment -
          Show
          michim Michi Mutsuzaki added a comment - I'm guessing node1 is hitting this case? https://github.com/apache/zookeeper/blob/76bb6747c8250f28157636cf4011b78e7569727a/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L332 In this case we don't log the message that gets sent out.
          Hide
          ziyouw Ziyou Wang added a comment -

          In this test, the ReconfigCommand class is changed to stay after finishing the reconfig request. The client connection for reconfiguring node 2 stays about extra 5 secs there. But the client connection for reconfiguring node 3 still fails immediately. So it means client connection is broken by something in server side.

          Show
          ziyouw Ziyou Wang added a comment - In this test, the ReconfigCommand class is changed to stay after finishing the reconfig request. The client connection for reconfiguring node 2 stays about extra 5 secs there. But the client connection for reconfiguring node 3 still fails immediately. So it means client connection is broken by something in server side.
          Hide
          ziyouw Ziyou Wang added a comment -

          Hi Alexander Shraer,

          I tried this and upload the logs. Seems the bug itself will break the client connection. In the client side, after it issues the request, I just get an exception: KeeperErrorCode = ConnectionLoss. So even we don't close the session, the server does.

          Show
          ziyouw Ziyou Wang added a comment - Hi Alexander Shraer , I tried this and upload the logs. Seems the bug itself will break the client connection. In the client side, after it issues the request, I just get an exception: KeeperErrorCode = ConnectionLoss. So even we don't close the session, the server does.
          Hide
          ziyouw Ziyou Wang added a comment -

          When the bug is triggered, the client connection which is used to issue the reconfig request also will be broken. The client will get a ConnectionLoss error as the reconfig result.

          Show
          ziyouw Ziyou Wang added a comment - When the bug is triggered, the client connection which is used to issue the reconfig request also will be broken. The client will get a ConnectionLoss error as the reconfig result.
          Hide
          rgs Raul Gutierrez Segales added a comment -

          That's expected right?

          Show
          rgs Raul Gutierrez Segales added a comment - That's expected right?
          Hide
          rgs Raul Gutierrez Segales added a comment -

          Slightly related: ZOOKEEPER-2202.

          Show
          rgs Raul Gutierrez Segales added a comment - Slightly related: ZOOKEEPER-2202 .
          Hide
          ziyouw Ziyou Wang added a comment -

          Alexander Shraerthought the bug may be caused when the client connection is closed. But the result shows that the server will close the connection before the client. In correct case, the connection should be kept until the client receives the new config status.

          Show
          ziyouw Ziyou Wang added a comment - Alexander Shraer thought the bug may be caused when the client connection is closed. But the result shows that the server will close the connection before the client. In correct case, the connection should be kept until the client receives the new config status.
          Hide
          shralex Alexander Shraer added a comment -

          In zoo-2-3.log you can see server 3 processing requests till the end of the run. The other servers seem ok too, the only thing is the connection being closed right after the reconfig, but this is expected - the leader election algorithm is being reset. Server 2 should still be connected to server 1 through the leader port and can continue processing requests. So the last logs seem quite ok.

          Can we conclude that the bug happens when the client disconnects before getting a response ?

          Show
          shralex Alexander Shraer added a comment - In zoo-2-3.log you can see server 3 processing requests till the end of the run. The other servers seem ok too, the only thing is the connection being closed right after the reconfig, but this is expected - the leader election algorithm is being reset. Server 2 should still be connected to server 1 through the leader port and can continue processing requests. So the last logs seem quite ok. Can we conclude that the bug happens when the client disconnects before getting a response ?
          Hide
          ziyouw Ziyou Wang added a comment -

          The bug shows again with the same code change in last change.

          Show
          ziyouw Ziyou Wang added a comment - The bug shows again with the same code change in last change.
          Hide
          ziyouw Ziyou Wang added a comment -

          The bug just was not triggered in that run. Please check the new log. I use the same code to try again and hit the bug this time. I really doubt that it is caused by client connections, because it is hard to explain why we cannot hit this bug with the same client after waiting some time.

          Show
          ziyouw Ziyou Wang added a comment - The bug just was not triggered in that run. Please check the new log. I use the same code to try again and hit the bug this time. I really doubt that it is caused by client connections, because it is hard to explain why we cannot hit this bug with the same client after waiting some time.
          Hide
          rgs Raul Gutierrez Segales added a comment -

          Ziyou Wang: it might be helpful to dump the FLE (FastLeaderElection) and ZAB messages when the bug is triggered. You could do this with zktraffic (https://github.com/twitter/zktraffic), i.e.:

          
          

          $ sudo pip install zktraffic
          $ sudo fle-dump --iface=eth0

          1. from another terminal
            $ sudo zab-dump --iface=eth0
          Show
          rgs Raul Gutierrez Segales added a comment - Ziyou Wang : it might be helpful to dump the FLE (FastLeaderElection) and ZAB messages when the bug is triggered. You could do this with zktraffic ( https://github.com/twitter/zktraffic ), i.e.: $ sudo pip install zktraffic $ sudo fle-dump --iface=eth0 from another terminal $ sudo zab-dump --iface=eth0
          Hide
          suda Akihiro Suda added a comment -

          Ziyou Wang
          FYI this patch for zktraffic might be useful: https://github.com/twitter/zktraffic/pull/30 (Deeper dissection of Quorum packets and Reconfig packets)

          Show
          suda Akihiro Suda added a comment - Ziyou Wang FYI this patch for zktraffic might be useful: https://github.com/twitter/zktraffic/pull/30 (Deeper dissection of Quorum packets and Reconfig packets)
          Hide
          shralex Alexander Shraer added a comment -

          Server 1:

          2015-06-03 17:15:27,692 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::QuorumCnxManager@365] - Opening channel to server 3
          2015-06-03 17:15:27,710 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::QuorumCnxManager@371] - Connected to server 3
          ...
          2015-06-03 17:15:31,649 [myid:1] - ERROR [LearnerHandler-/10.0.0.2:50710:LearnerHandler@580] - Unexpected exception causing shutdown while sock still open
          java.net.SocketTimeoutException: Read timed out
          at java.net.SocketInputStream.socketRead0(Native Method)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.net.SocketInputStream.read(Unknown Source)
          at java.io.BufferedInputStream.fill(Unknown Source)
          at java.io.BufferedInputStream.read(Unknown Source)
          at java.io.DataInputStream.readInt(Unknown Source)
          at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
          at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
          at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
          at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:493)
          2015-06-03 17:15:31,650 [myid:1] - WARN [LearnerHandler-/10.0.0.2:50710:LearnerHandler@595] - ******* GOODBYE /10.0.0.2:50710 ********
          Server 3:

          2015-06-03 17:15:27,702 [myid:3] - INFO [/10.0.0.3:1200:QuorumCnxManager$Listener@556] - Received connection request /10.0.0.1:39184
          2015-06-03 17:15:31,718 [myid:3] - WARN [/10.0.0.3:1200:QuorumCnxManager@268] - Exception reading or writing challenge: java.net.SocketTimeoutException: Read timed out

          It seems that server 3 is still in the initial connection phase while server 1 already passed it and wants to start working (fails on the qp = new QuorumPacket() line).

          It may be helpful to log everything server 3 gets from server 1 before it crashes in QuorumCnxnManager receiveConnection, exactly what execution path is taken, etc. You could also print server 1's side in LearnerHandler.java

          Also, are you sure that the zookeeper distribution on these two servers is the same ?

          Show
          shralex Alexander Shraer added a comment - Server 1: 2015-06-03 17:15:27,692 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::QuorumCnxManager@365] - Opening channel to server 3 2015-06-03 17:15:27,710 [myid:1] - DEBUG [ProcessThread(sid:1 cport:-1)::QuorumCnxManager@371] - Connected to server 3 ... 2015-06-03 17:15:31,649 [myid:1] - ERROR [LearnerHandler-/10.0.0.2:50710:LearnerHandler@580] - Unexpected exception causing shutdown while sock still open java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at java.io.DataInputStream.readInt(Unknown Source) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:493) 2015-06-03 17:15:31,650 [myid:1] - WARN [LearnerHandler-/10.0.0.2:50710:LearnerHandler@595] - ******* GOODBYE /10.0.0.2:50710 ******** Server 3: 2015-06-03 17:15:27,702 [myid:3] - INFO [/10.0.0.3:1200:QuorumCnxManager$Listener@556] - Received connection request /10.0.0.1:39184 2015-06-03 17:15:31,718 [myid:3] - WARN [/10.0.0.3:1200:QuorumCnxManager@268] - Exception reading or writing challenge: java.net.SocketTimeoutException: Read timed out It seems that server 3 is still in the initial connection phase while server 1 already passed it and wants to start working (fails on the qp = new QuorumPacket() line). It may be helpful to log everything server 3 gets from server 1 before it crashes in QuorumCnxnManager receiveConnection, exactly what execution path is taken, etc. You could also print server 1's side in LearnerHandler.java Also, are you sure that the zookeeper distribution on these two servers is the same ?
          Hide
          shralex Alexander Shraer added a comment -

          btw, sorry for being slow to respond

          Show
          shralex Alexander Shraer added a comment - btw, sorry for being slow to respond
          Hide
          shralex Alexander Shraer added a comment -

          I might have misread the log - the exception on server 1 may be due to a problem talking with server 2. Would be good to log the server id to which the learnerhandler belongs.

          Another suggestion is to log more things in Follower.java - case Leader.PROPOSAL and Leader.COMMIT.
          Its possible that something happens there that causes the follower to hang and the leader's socket to timeout.

          Show
          shralex Alexander Shraer added a comment - I might have misread the log - the exception on server 1 may be due to a problem talking with server 2. Would be good to log the server id to which the learnerhandler belongs. Another suggestion is to log more things in Follower.java - case Leader.PROPOSAL and Leader.COMMIT. Its possible that something happens there that causes the follower to hang and the leader's socket to timeout.
          Hide
          ziyouw Ziyou Wang added a comment -

          Yes, I also suspect this is caused when server 3 is still syncing data from server 1 or server 2. So it explains why this bug is easier to be triggered in slow environment and can be avoided by waiting.

          So you mean I need to add extra log in QuorumCnxnManager and LearnerHandler for this case?

          Yes, the three servers use the same docker image.

          Show
          ziyouw Ziyou Wang added a comment - Yes, I also suspect this is caused when server 3 is still syncing data from server 1 or server 2. So it explains why this bug is easier to be triggered in slow environment and can be avoided by waiting. So you mean I need to add extra log in QuorumCnxnManager and LearnerHandler for this case? Yes, the three servers use the same docker image.
          Hide
          ziyouw Ziyou Wang added a comment -

          Hi, Raul Gutierrez Segalesand Akihiro Suda

          Thanks a lot for this info. I can try this when I try to reproduce it next time.

          Show
          ziyouw Ziyou Wang added a comment - Hi, Raul Gutierrez Segales and Akihiro Suda Thanks a lot for this info. I can try this when I try to reproduce it next time.
          Hide
          shralex Alexander Shraer added a comment -

          Yeah, just add a lot of logging statements if you could, in QuorumCnxnManager, LearnerHandler and Follower.java where proposals and COMMITS are received. I'd actually start from the latter to see if server 2 somehow gets stuck in the middle of processing a reconfig command.

          Show
          shralex Alexander Shraer added a comment - Yeah, just add a lot of logging statements if you could, in QuorumCnxnManager, LearnerHandler and Follower.java where proposals and COMMITS are received. I'd actually start from the latter to see if server 2 somehow gets stuck in the middle of processing a reconfig command.
          Hide
          shralex Alexander Shraer added a comment -

          Hi Ziyou Wang, could you please check if the bug still exists with 2212 patch applied ?

          Show
          shralex Alexander Shraer added a comment - Hi Ziyou Wang , could you please check if the bug still exists with 2212 patch applied ?
          Hide
          ziyouw Ziyou Wang added a comment -

          Hi Alexander Shraer, thanks for notifying this. I try the 2212 patch today and find it indeed reduce the chance to reproduce the problem. From the log, I can see this logic:

          if (!rqv.equals(curQV))

          { LOG.info("restarting leader election"); self.shuttingDownLE = true; self.getElectionAlg().shutdown(); break; }

          will be executed in node 2 once and node 3 twice in success case.

          But I still can hit the problem (although it is hard than before). In the fail case, I find the above logic only is executed once in the node 3.

          Sorry for updating this jira late, I am busy for some urgent issues before.

          Show
          ziyouw Ziyou Wang added a comment - Hi Alexander Shraer , thanks for notifying this. I try the 2212 patch today and find it indeed reduce the chance to reproduce the problem. From the log, I can see this logic: if (!rqv.equals(curQV)) { LOG.info("restarting leader election"); self.shuttingDownLE = true; self.getElectionAlg().shutdown(); break; } will be executed in node 2 once and node 3 twice in success case. But I still can hit the problem (although it is hard than before). In the fail case, I find the above logic only is executed once in the node 3. Sorry for updating this jira late, I am busy for some urgent issues before.
          Hide
          ziyouw Ziyou Wang added a comment -

          Log files after applying patch 2212. The bug is hard to be reproduced after applying 2212.

          Show
          ziyouw Ziyou Wang added a comment - Log files after applying patch 2212. The bug is hard to be reproduced after applying 2212.
          Hide
          suda Akihiro Suda added a comment -

          ZOOKEEPER-2212 seems not directly related to this bug.

          If ZK hits a race condition that is resolved in ZOOKEEPER-2212, LOG.debug("Skip processReconfig(), state: {}", self.getServerState()); must be found in log files.
          https://github.com/apache/zookeeper/blob/ec056d3c3a18b862d0cd83296b7d4319652b0b1c/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L308

          Perhaps the bug got hard due to other causes (e.g., network latency gets shorter)?

          Show
          suda Akihiro Suda added a comment - ZOOKEEPER-2212 seems not directly related to this bug. If ZK hits a race condition that is resolved in ZOOKEEPER-2212 , LOG.debug("Skip processReconfig(), state: {}", self.getServerState()); must be found in log files. https://github.com/apache/zookeeper/blob/ec056d3c3a18b862d0cd83296b7d4319652b0b1c/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L308 Perhaps the bug got hard due to other causes (e.g., network latency gets shorter)?
          Hide
          suda Akihiro Suda added a comment -

          ZOOKEEPER-2212 seems not directly related to this bug.

          If ZK hits a race condition that is resolved in ZOOKEEPER-2212, LOG.debug("Skip processReconfig(), state: {}", self.getServerState()); must be found in log files.
          https://github.com/apache/zookeeper/blob/ec056d3c3a18b862d0cd83296b7d4319652b0b1c/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L308

          Perhaps the bug got hard due to other causes (e.g., network latency gets shorter)?

          Show
          suda Akihiro Suda added a comment - ZOOKEEPER-2212 seems not directly related to this bug. If ZK hits a race condition that is resolved in ZOOKEEPER-2212 , LOG.debug("Skip processReconfig(), state: {}", self.getServerState()); must be found in log files. https://github.com/apache/zookeeper/blob/ec056d3c3a18b862d0cd83296b7d4319652b0b1c/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L308 Perhaps the bug got hard due to other causes (e.g., network latency gets shorter)?
          Hide
          ziyouw Ziyou Wang added a comment -

          Just check the patch again. Yes, we should not hit 2212 here. But it is really weird to find the affection in our case. I just use three docker containers to run the nodes in the same host. So I don't think the network latency will change, but I am not sure the disk latency changes or not.

          Show
          ziyouw Ziyou Wang added a comment - Just check the patch again. Yes, we should not hit 2212 here. But it is really weird to find the affection in our case. I just use three docker containers to run the nodes in the same host. So I don't think the network latency will change, but I am not sure the disk latency changes or not.
          Hide
          suda Akihiro Suda added a comment -

          Hi,

          It is so interesting that both of servers 1 and 2 timeout at 15:55:08,439 after reconfig has begun at 15:55:04.
          After these timeouts, both ZooKeeperServer cannot be revived and the ensemble gets weird.
          (However in zoo-3-2.log (Jun 3), server 2 raises EOFException, not SocketTimeoutException at 17:15:31).

          These timeouts are raised by this while loop in server 1 and this while loop in server 2.

          Unfortunately, we are not sure which types of QuorumPacket are triggering these timeouts.
          So I think it might be helpful to add LOG.debug(qp.getType()) at this switch for server 1 and this switch for server 2.

          Perhaps they are not pinging each other?
          This comment in LearnerHandler.ping() seems interesting.

          // If learner hasn't sync properly yet, don't send ping packet
          // otherwise, the learner will crash

          Show
          suda Akihiro Suda added a comment - Hi, It is so interesting that both of servers 1 and 2 timeout at 15:55:08,439 after reconfig has begun at 15:55:04. After these timeouts, both ZooKeeperServer cannot be revived and the ensemble gets weird. (However in zoo-3-2.log (Jun 3), server 2 raises EOFException , not SocketTimeoutException at 17:15:31). These timeouts are raised by this while loop in server 1 and this while loop in server 2. Unfortunately, we are not sure which types of QuorumPacket are triggering these timeouts. So I think it might be helpful to add LOG.debug(qp.getType()) at this switch for server 1 and this switch for server 2. Perhaps they are not pinging each other? This comment in LearnerHandler.ping() seems interesting. // If learner hasn't sync properly yet, don't send ping packet // otherwise, the learner will crash
          Hide
          ziyouw Ziyou Wang added a comment -

          Add log to record quorum packet types.

          Show
          ziyouw Ziyou Wang added a comment - Add log to record quorum packet types.
          Hide
          ziyouw Ziyou Wang added a comment -

          Thanks for looking on this. I always suspect this problem may has relationship with the sync. Because I need to wait more time to avoid it when the cluster is running with a slow disk.

          I upload the log files after I add the log to record the quorum packet type.

          Show
          ziyouw Ziyou Wang added a comment - Thanks for looking on this. I always suspect this problem may has relationship with the sync. Because I need to wait more time to avoid it when the cluster is running with a slow disk. I upload the log files after I add the log to record the quorum packet type.
          Hide
          suda Akihiro Suda added a comment -

          Hi,Ziyou Wang

          Thank you for logging.

          From the latest logs, we can confirm that:

          Maybe Leader.lead() and Leader.propose() are racing due to some reasons related to syncing?
          I will check this later.

          Show
          suda Akihiro Suda added a comment - Hi, Ziyou Wang Thank you for logging. From the latest logs, we can confirm that: server 1 is not pinging(packet type=5) to server 2 after reconfig at 13:30:33,573, and hence server 2 cannot pong to server 1. then server 1 timeouts for pong from server 2 and shutdowns( this at 13:30:37,560 and this at 13:30:53,638). server 1 is not proposing(packet type=2) to server 2 . CommitProcessor is working at 13:30:33,575, but ProposalRequestProcessor seems not. Maybe Leader.lead() and Leader.propose() are racing due to some reasons related to syncing? I will check this later.
          Hide
          ziyouw Ziyou Wang added a comment -

          Can we add more logs to find the race condition here? I guess the problem will happens when server 3 still is syncing data from leader. So if you can slow down this sync process and do the reconfig before it finishes, we may also can reproduce this process in normal environment.

          Show
          ziyouw Ziyou Wang added a comment - Can we add more logs to find the race condition here? I guess the problem will happens when server 3 still is syncing data from leader. So if you can slow down this sync process and do the reconfig before it finishes, we may also can reproduce this process in normal environment.
          Hide
          suda Akihiro Suda added a comment -

          Yes, as many logs as possible might be helpful.
          Plus some additional information such as the accurate ZK version, workload scripts, or filesystem information might be also helpful.

          I am trying to reproduce the bug by injecting some {{Thread.sleep()}}s into syncing-related functions using byteman.
          But I could not reproduced the bug at this moment, as I am not sure which function should be injected.

          Show
          suda Akihiro Suda added a comment - Yes, as many logs as possible might be helpful. Plus some additional information such as the accurate ZK version, workload scripts, or filesystem information might be also helpful. I am trying to reproduce the bug by injecting some {{Thread.sleep()}}s into syncing-related functions using byteman. But I could not reproduced the bug at this moment, as I am not sure which function should be injected.
          Hide
          mitake Hitoshi Mitake added a comment -

          Hi Ziyou Wang,

          It seems that I could reproduce this problem. Just adding new servers with reconfig one by one, then the ensemble rejects every client request.
          (Of course there is a possibility of my misunderstanding)

          I used our distributed systems debugger named earthquake. It uses byteman and inspect execution of debuggee (zookeeper server in this case). It tries to cause corner case situations that is hard to be produced in ordinal testing by reordering inspected method calls and returns.

          We are preparing a docker image for easy reproducing in your environment. Please wait for a while.

          I'm analyzing the problem and would like to post the root cause and patch, but it may take a time because I'm new to zookeeper. So I attached logs (zookeeper-123.out) and the history of ensemble (history.txt). The logs seem to be similar to yours.
          The format of the history is earthquake specific format, so it wouldn't be easy to read. But I think you can interpret the event sequence roughly (it is just a sequence of method calls and returns + their stacktrace). It would be great if I can hear your comments.

          Show
          mitake Hitoshi Mitake added a comment - Hi Ziyou Wang , It seems that I could reproduce this problem. Just adding new servers with reconfig one by one, then the ensemble rejects every client request. (Of course there is a possibility of my misunderstanding) I used our distributed systems debugger named earthquake . It uses byteman and inspect execution of debuggee (zookeeper server in this case). It tries to cause corner case situations that is hard to be produced in ordinal testing by reordering inspected method calls and returns. We are preparing a docker image for easy reproducing in your environment. Please wait for a while. I'm analyzing the problem and would like to post the root cause and patch, but it may take a time because I'm new to zookeeper. So I attached logs (zookeeper-123.out) and the history of ensemble (history.txt). The logs seem to be similar to yours. The format of the history is earthquake specific format, so it wouldn't be easy to read. But I think you can interpret the event sequence roughly (it is just a sequence of method calls and returns + their stacktrace). It would be great if I can hear your comments.
          Hide
          shralex Alexander Shraer added a comment -

          It looks like server 2 is crashing before the first reconfig is invoked (18:14:48 is when the reconfig happens).

          This is what happens on server 2:

          INFO: paramMap:

          {quorumPacket=PING 200000001 null}

          quorum packet send in Follower : org.jboss.byteman.rule.exception.ExecuteException: MethodExpression.interpret : exception invoking method packetToString file /earthquake/reconfig-trunk/materials/server.btm line 34
          2015-07-21 18:14:43,735 [myid:2] - ERROR [SyncThread:2:ZooKeeperCriticalThread@48] - Severe unrecoverable error, from thread : SyncThread:2
          org.jboss.byteman.rule.exception.ExecuteException: MethodExpression.interpret : exception invoking method packetToString file /earthquake/reconfig-trunk/materials/server.btm line 34
          at org.jboss.byteman.rule.expression.MethodExpression.interpret(MethodExpression.java:347)
          at org.jboss.byteman.rule.expression.MethodExpression.interpret(MethodExpression.java:334)
          at org.jboss.byteman.rule.Action.interpret(Action.java:144)
          at net.osrg.earthquake.PBEQHelper_HelperAdapter_Interpreted_2.fire(server.btm)
          at net.osrg.earthquake.PBEQHelper_HelperAdapter_Interpreted_2.execute0(server.btm)
          at net.osrg.earthquake.PBEQHelper_HelperAdapter_Interpreted_2.execute(server.btm)
          at org.jboss.byteman.rule.Rule.execute(Rule.java:684)
          at org.jboss.byteman.rule.Rule.execute(Rule.java:653)
          at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java)
          at org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:62)
          at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:186)
          at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
          Caused by: java.lang.NullPointerException
          at org.apache.zookeeper.server.quorum.LearnerHandler.packetToString(LearnerHandler.java:261)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:606)
          at org.jboss.byteman.rule.expression.MethodExpression.interpret(MethodExpression.java:341)
          ... 11 more
          2015-07-21 18:14:43,735 [myid:2] - INFO [SyncThread:2:ZooKeeperServer$ZooKeeperServerListenerImpl@442] - Thread SyncThread:2 exits, error code 1
          2015-07-21 18:14:43,735 [myid:2] - INFO [SyncThread:2:LearnerZooKeeperServer@165] - Shutting down
          2015-07-21 18:14:43,735 [myid:2] - INFO [SyncThread:2:ZooKeeperServer@465] - shutting down
          2015-07-21 18:14:43,735 [myid:2] - INFO [SyncThread:2:FollowerRequestProcessor@138] - Shutting down
          2015-07-21 18:14:43,735 [myid:2] - INFO [SyncThread:2:CommitProcessor@358] - Shutting down
          2015-07-21 18:14:43,736 [myid:2] - INFO [FollowerRequestProcessor:2:FollowerRequestProcessor@109] - FollowerRequestProcessor exited loop!

          then server 1, the leader, is sending an incremental reconfig and timing out and closing the connection.

          2015-07-21 18:14:48,750 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@512] - Incremental reconfig
          Jul 21, 2015 6:14:48 PM net.osrg.earthquake.PBInspector EventFuncReturn
          INFO: paramMap:

          {quorumPacket=PROPOSAL 200000002 null}

          .....
          2015-07-21 18:15:02,739 [myid:1] - WARN [QuorumPeer[myid=1](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):LearnerHandler@937] - Closing connection to peer due to transaction timeout.

          Show
          shralex Alexander Shraer added a comment - It looks like server 2 is crashing before the first reconfig is invoked (18:14:48 is when the reconfig happens). This is what happens on server 2: INFO: paramMap: {quorumPacket=PING 200000001 null} quorum packet send in Follower : org.jboss.byteman.rule.exception.ExecuteException: MethodExpression.interpret : exception invoking method packetToString file /earthquake/reconfig-trunk/materials/server.btm line 34 2015-07-21 18:14:43,735 [myid:2] - ERROR [SyncThread:2:ZooKeeperCriticalThread@48] - Severe unrecoverable error, from thread : SyncThread:2 org.jboss.byteman.rule.exception.ExecuteException: MethodExpression.interpret : exception invoking method packetToString file /earthquake/reconfig-trunk/materials/server.btm line 34 at org.jboss.byteman.rule.expression.MethodExpression.interpret(MethodExpression.java:347) at org.jboss.byteman.rule.expression.MethodExpression.interpret(MethodExpression.java:334) at org.jboss.byteman.rule.Action.interpret(Action.java:144) at net.osrg.earthquake.PBEQHelper_HelperAdapter_Interpreted_2.fire(server.btm) at net.osrg.earthquake.PBEQHelper_HelperAdapter_Interpreted_2.execute0(server.btm) at net.osrg.earthquake.PBEQHelper_HelperAdapter_Interpreted_2.execute(server.btm) at org.jboss.byteman.rule.Rule.execute(Rule.java:684) at org.jboss.byteman.rule.Rule.execute(Rule.java:653) at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java) at org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:62) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:186) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) Caused by: java.lang.NullPointerException at org.apache.zookeeper.server.quorum.LearnerHandler.packetToString(LearnerHandler.java:261) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.jboss.byteman.rule.expression.MethodExpression.interpret(MethodExpression.java:341) ... 11 more 2015-07-21 18:14:43,735 [myid:2] - INFO [SyncThread:2:ZooKeeperServer$ZooKeeperServerListenerImpl@442] - Thread SyncThread:2 exits, error code 1 2015-07-21 18:14:43,735 [myid:2] - INFO [SyncThread:2:LearnerZooKeeperServer@165] - Shutting down 2015-07-21 18:14:43,735 [myid:2] - INFO [SyncThread:2:ZooKeeperServer@465] - shutting down 2015-07-21 18:14:43,735 [myid:2] - INFO [SyncThread:2:FollowerRequestProcessor@138] - Shutting down 2015-07-21 18:14:43,735 [myid:2] - INFO [SyncThread:2:CommitProcessor@358] - Shutting down 2015-07-21 18:14:43,736 [myid:2] - INFO [FollowerRequestProcessor:2:FollowerRequestProcessor@109] - FollowerRequestProcessor exited loop! then server 1, the leader, is sending an incremental reconfig and timing out and closing the connection. 2015-07-21 18:14:48,750 [myid:1] - INFO [ProcessThread(sid:1 cport:-1)::PrepRequestProcessor@512] - Incremental reconfig Jul 21, 2015 6:14:48 PM net.osrg.earthquake.PBInspector EventFuncReturn INFO: paramMap: {quorumPacket=PROPOSAL 200000002 null} ..... 2015-07-21 18:15:02,739 [myid:1] - WARN [QuorumPeer [myid=1] (plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):LearnerHandler@937] - Closing connection to peer due to transaction timeout.
          Hide
          shralex Alexander Shraer added a comment -

          I may be wrong, but it looks like the crash on server 2 happens after the sync, when the server tries to respond to the leader's PING. It executes the following:

          protected void processPacket(QuorumPacket qp) throws Exception{
          switch (qp.getType()) {
          case Leader.PING:
          ping(qp);

          So just tries to send a PING back. The ping() function in Learner.java sends the same packet back, just changes the data, it doesn't create a new packet. I wonder if its a problem somehow that a new packet is not being created. The exception message is NullPointerException, so maybe somehow the old packet gets deallocated ?

          Show
          shralex Alexander Shraer added a comment - I may be wrong, but it looks like the crash on server 2 happens after the sync, when the server tries to respond to the leader's PING. It executes the following: protected void processPacket(QuorumPacket qp) throws Exception{ switch (qp.getType()) { case Leader.PING: ping(qp); So just tries to send a PING back. The ping() function in Learner.java sends the same packet back, just changes the data, it doesn't create a new packet. I wonder if its a problem somehow that a new packet is not being created. The exception message is NullPointerException, so maybe somehow the old packet gets deallocated ?
          Hide
          mitake Hitoshi Mitake added a comment -

          Hi Alexander Shraer, thanks for your reply.

          As you pointed, the crash is caused in the inspection layer (written in byteman). Sorry for bothering.

          But the NullPointerException is a little bit odd. The exception is caused by the byteman script like this:
          RULE quorum packet receive in Follower
          CLASS Learner
          METHOD readPacket
          HELPER net.osrg.earthquake.PBEQHelper
          BIND argMap = new java.util.HashMap()
          AT EXIT
          IF $# == 1
          DO
          argMap.put("quorumPacket", org.apache.zookeeper.server.quorum.LearnerHandler.packetToString($1));
          eventFuncReturn("Learner.readPacket", argMap);
          ENDRULE

          IIUC, the quorumpacket will never be null in follower. I'll look at the problem.

          Show
          mitake Hitoshi Mitake added a comment - Hi Alexander Shraer , thanks for your reply. As you pointed, the crash is caused in the inspection layer (written in byteman). Sorry for bothering. But the NullPointerException is a little bit odd. The exception is caused by the byteman script like this: RULE quorum packet receive in Follower CLASS Learner METHOD readPacket HELPER net.osrg.earthquake.PBEQHelper BIND argMap = new java.util.HashMap() AT EXIT IF $# == 1 DO argMap.put("quorumPacket", org.apache.zookeeper.server.quorum.LearnerHandler.packetToString($1)); eventFuncReturn("Learner.readPacket", argMap); ENDRULE IIUC, the quorumpacket will never be null in follower. I'll look at the problem.
          Hide
          ziyouw Ziyou Wang added a comment -

          Thanks for watching on this. For this JIRA, I think the problem will happen if we do reconfig before the new node can sync all data from leader. So cloud we use earthquake to simulate this race condition? Currently, this problem only can be hit in VM environment.

          Show
          ziyouw Ziyou Wang added a comment - Thanks for watching on this. For this JIRA, I think the problem will happen if we do reconfig before the new node can sync all data from leader. So cloud we use earthquake to simulate this race condition? Currently, this problem only can be hit in VM environment.
          Hide
          mitake Hitoshi Mitake added a comment -

          Hi Ziyou Wang,

          Could you check my understanding is correct? IIUC, your situation is like below:
          1. server 1 boot
          2. server 2 boot
          3. client issues reconfig to server 1
          4. server 2 tries to sync with server 1 with Learner.syncWithLeader()
          5. server 3 boot
          6. client issues reconfig to server 1

          (reconfig requests in 3 and 6 are overwrapping)

          Is this correct, I'll be able to reproduce the situation with earthquake.

          Show
          mitake Hitoshi Mitake added a comment - Hi Ziyou Wang , Could you check my understanding is correct? IIUC, your situation is like below: 1. server 1 boot 2. server 2 boot 3. client issues reconfig to server 1 4. server 2 tries to sync with server 1 with Learner.syncWithLeader() 5. server 3 boot 6. client issues reconfig to server 1 (reconfig requests in 3 and 6 are overwrapping) Is this correct, I'll be able to reproduce the situation with earthquake.
          Hide
          mitake Hitoshi Mitake added a comment -

          Ziyou Wang BTW, if it is possible, could you share your dockerfile for your testing?

          Show
          mitake Hitoshi Mitake added a comment - Ziyou Wang BTW, if it is possible, could you share your dockerfile for your testing?
          Hide
          mitake Hitoshi Mitake added a comment -

          Ziyou Wang BTW, if it is possible, could you share your dockerfile for your testing?

          Show
          mitake Hitoshi Mitake added a comment - Ziyou Wang BTW, if it is possible, could you share your dockerfile for your testing?
          Hide
          mitake Hitoshi Mitake added a comment -

          Ziyou Wang BTW, if it is possible, could you share your dockerfile for your testing?

          Show
          mitake Hitoshi Mitake added a comment - Ziyou Wang BTW, if it is possible, could you share your dockerfile for your testing?
          Hide
          mitake Hitoshi Mitake added a comment -

          Sorry for bothering with duplicated replies...

          Show
          mitake Hitoshi Mitake added a comment - Sorry for bothering with duplicated replies...
          Hide
          mitake Hitoshi Mitake added a comment -

          Hi Ziyou Wang,

          I found a little bit strange code path like below:
          1. In the tail of Leader.shutdown(), leader tries to remove all learner handlers with synchronized (learners). The loop calls LearnerHandler.shutdown().
          2. In LearnerHandler.shutdown(), learder.removeLearnerHandler() is called.
          3. In Leader.removeLearnerHandler(), the member of Leader, learners, is also locked by synchronized

          Seems that the above sequence can cause deadlock.

          I removed synchronized(learners) in removeLearnerHandler in the attached patch. Could you test it on your environment?

          1. the targetting version is 3.5.0
          Show
          mitake Hitoshi Mitake added a comment - Hi Ziyou Wang , I found a little bit strange code path like below: 1. In the tail of Leader.shutdown(), leader tries to remove all learner handlers with synchronized (learners). The loop calls LearnerHandler.shutdown(). 2. In LearnerHandler.shutdown(), learder.removeLearnerHandler() is called. 3. In Leader.removeLearnerHandler(), the member of Leader, learners, is also locked by synchronized Seems that the above sequence can cause deadlock. I removed synchronized(learners) in removeLearnerHandler in the attached patch. Could you test it on your environment? the targetting version is 3.5.0
          Hide
          mitake Hitoshi Mitake added a comment -

          Sorry, the synchronized is reentrant, the patch would be wrong... please ignore it.

          Show
          mitake Hitoshi Mitake added a comment - Sorry, the synchronized is reentrant, the patch would be wrong... please ignore it.
          Hide
          mitake Hitoshi Mitake added a comment -

          But the attached logs (with DEBUG level) don't contain messages of QuorumPeer.updateServerState(). Perhaps shutdown process of leader is stopping QuorumPeer main thread?

          Show
          mitake Hitoshi Mitake added a comment - But the attached logs (with DEBUG level) don't contain messages of QuorumPeer.updateServerState(). Perhaps shutdown process of leader is stopping QuorumPeer main thread?
          Hide
          ziyouw Ziyou Wang added a comment -

          Sorry, I had some problems to receive mails from the registered mail address in last week.

          The situation is quite simple: start one zk server, and reconfig in server 1 to add the new server in the cluster, then start and reconfig the next server. But 4 should be finished before 3, because when we increase the node number from 1 to 2, zk must make sure the cluster still in majority after reconfig. When we add the second server into the cluster, then the server 3 is started and reconfigured to be part of the cluster.

          Show
          ziyouw Ziyou Wang added a comment - Sorry, I had some problems to receive mails from the registered mail address in last week. The situation is quite simple: start one zk server, and reconfig in server 1 to add the new server in the cluster, then start and reconfig the next server. But 4 should be finished before 3, because when we increase the node number from 1 to 2, zk must make sure the cluster still in majority after reconfig. When we add the second server into the cluster, then the server 3 is started and reconfigured to be part of the cluster.
          Hide
          ziyouw Ziyou Wang added a comment -

          Sorry, it is commercial code and I don't have the right to share it. But the environment is quite simple, I just run each zk server in one docker container and all the containers are running in the same VM. I think the key problem of this environment is the disk latency should be longer than normal cases. And this problem will become even serious if all of them are running with a slow disk.

          Show
          ziyouw Ziyou Wang added a comment - Sorry, it is commercial code and I don't have the right to share it. But the environment is quite simple, I just run each zk server in one docker container and all the containers are running in the same VM. I think the key problem of this environment is the disk latency should be longer than normal cases. And this problem will become even serious if all of them are running with a slow disk.
          Hide
          arshad.mohammad Mohammad Arshad added a comment -

          We also faced this issue.
          The problem occurs when reconfig's PROPOSAL and COMMITANDACTIVATE come in between the snapshot and the uptodate
          Following steps can followed to reproduce this issue very easily:

          1. start three server zookeeper cluster, lets say servers are server-1, server-2, server-3
          2. create big data in zookeeper, around 150 MB
          3. install the fourth server server-4, add server information of all the four servers in server-4
            server.1=192.168.1.3:2888:3888:participant
            server.2=192.168.1.3:2889:3889:participant
            server.3=192.168.1.3:2890:3890:participant
            server.4=192.168.1.2:2890:3890:participant
            
          4. connect to any of the existing servers
          5. start server-4, immediately run reconfig command from already connected client.
            reconfig -add server.4=192.168.1.2:2890:3890:participant;2181
          6. open zookeeper/conf folder, you will find zoo.cfg.dynamic.next and existing quorum dynamic configuration file zoo.cfg.dynamic.100000000
            zoo.cfg.dynamic.next --> this has information of all the servers
            zoo.cfg.dynamic.100000000 --> this has information of only existing servers server-1,server-2,server-3
          7. Even though server-4 started and joined the quorum, if try to to restart, it will fail with following errors
            2016-07-24 11:00:11,689 [myid:4] - ERROR [main:QuorumPeerMain@98] - Unexpected exception, exiting abnormally
            java.lang.RuntimeException: My id 4 not in the peer list
            		at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:748)
            		at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:183)
            		at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:120)
            		at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)
            
          Show
          arshad.mohammad Mohammad Arshad added a comment - We also faced this issue. The problem occurs when reconfig's PROPOSAL and COMMITANDACTIVATE come in between the snapshot and the uptodate Following steps can followed to reproduce this issue very easily: start three server zookeeper cluster, lets say servers are server-1, server-2, server-3 create big data in zookeeper, around 150 MB install the fourth server server-4, add server information of all the four servers in server-4 server.1=192.168.1.3:2888:3888:participant server.2=192.168.1.3:2889:3889:participant server.3=192.168.1.3:2890:3890:participant server.4=192.168.1.2:2890:3890:participant connect to any of the existing servers start server-4, immediately run reconfig command from already connected client. reconfig -add server.4=192.168.1.2:2890:3890:participant;2181 open zookeeper/conf folder, you will find zoo.cfg.dynamic.next and existing quorum dynamic configuration file zoo.cfg.dynamic.100000000 zoo.cfg.dynamic.next --> this has information of all the servers zoo.cfg.dynamic.100000000 --> this has information of only existing servers server-1,server-2,server-3 Even though server-4 started and joined the quorum, if try to to restart, it will fail with following errors 2016-07-24 11:00:11,689 [myid:4] - ERROR [main:QuorumPeerMain@98] - Unexpected exception, exiting abnormally java.lang.RuntimeException: My id 4 not in the peer list at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:748) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:183) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:120) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)
          Hide
          arshad.mohammad Mohammad Arshad added a comment -

          The issue occurs when reconfig's PROPOSAL and COMMITANDACTIVATE come in between the snapshot and the uptodate, while syncing with the leader.
          In the existing code the reconfig commit is not processed as it should be processed for follower. In case of observer the reconfig's commit is processed properly.
          We can process the reconfig's commit for follower in the same way as it is being processed for observer to fix this issue.
          Submitting the fix

          Show
          arshad.mohammad Mohammad Arshad added a comment - The issue occurs when reconfig's PROPOSAL and COMMITANDACTIVATE come in between the snapshot and the uptodate, while syncing with the leader. In the existing code the reconfig commit is not processed as it should be processed for follower. In case of observer the reconfig's commit is processed properly. We can process the reconfig's commit for follower in the same way as it is being processed for observer to fix this issue. Submitting the fix
          Hide
          shralex Alexander Shraer added a comment -

          Could you please provide the logs ?

          The existence of .next file indicates that there was a failure in the middle of reconfig, and the commitandactivate
          message didn't arrive to the server on which you find this file. Which server was it ? just 4 or all of them ?
          The zoo.cfg.dynamic.100000000 file is the old configuration.

          Did the other servers continue to operate normally ? did they reboot ? were they able to serve requests afterwards ? would be helpful if you describe this too.

          7) is actually expected if server 4 crashed before it got the commitandactivate message. I described this in the manual:

          "Finally, note that once connected to the leader, a joiner adopts the last committed configuration, in which it is absent (the initial config of the joiner is backed up before being rewritten). If the joiner restarts in this state, it will not be able to boot since it is absent from its configuration file. In order to start it you’ll once again have to specify an initial configuration."

          Show
          shralex Alexander Shraer added a comment - Could you please provide the logs ? The existence of .next file indicates that there was a failure in the middle of reconfig, and the commitandactivate message didn't arrive to the server on which you find this file. Which server was it ? just 4 or all of them ? The zoo.cfg.dynamic.100000000 file is the old configuration. Did the other servers continue to operate normally ? did they reboot ? were they able to serve requests afterwards ? would be helpful if you describe this too. 7) is actually expected if server 4 crashed before it got the commitandactivate message. I described this in the manual: "Finally, note that once connected to the leader, a joiner adopts the last committed configuration, in which it is absent (the initial config of the joiner is backed up before being rewritten). If the joiner restarts in this state, it will not be able to boot since it is absent from its configuration file. In order to start it you’ll once again have to specify an initial configuration."
          Hide
          arshad.mohammad Mohammad Arshad added a comment -

          The existence of .next file indicates that there was a failure in the middle of reconfig, and the commitandactivate message didn't arrive to the server on which you find this file.

          Failure may be one of scenarios where .next file is not deleted but not the only scenario. In this scenario .next file exists because of logical problem in the code.
          commitandactivate message arrives but it is not getting process because of bellow reason:
          sequence of events:
          1) case Leader.NEWLEADER:
          lastSeenQuorumVerifier is updated with 100000000 and .next file created.
          2) case Leader.PROPOSAL(reconfig):
          lastSeenQuorumVerifier is updated with 200000000 and earlier .next file overridden
          3) case Leader.COMMITANDACTIVATE:
          Because snapshot in taken in step 1 snapshotTaken=true and self.processReconfig(); is not called. This call was supposed to delete .next file and create the updated zoo.cfg.dynamic.200000000 file
          code reference:

          if (!snapshotTaken) {
          ----
          boolean majorChange =
          	   self.processReconfig(qv, ByteBuffer.wrap(qp.getData()).getLong(), qp.getZxid(), true);
          ----
          }
          

          4)
          case Leader.UPTODATE:
          this calls self.processReconfig but again it is skipped because the lastSeenQuorumVerifier version is higher. it got updated in 2)

           public synchronized QuorumVerifier setQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){
                  if ((quorumVerifier != null) && (quorumVerifier.getVersion() >= qv.getVersion())) {
          
          

          Which server was it ? just 4 or all of them ?

          just 4

          is actually expected if server 4 crashed before it got the commitandactivate message. I described this in the manual:

          But the sever did not crash. It is in normal flow

          Could you please provide the logs ?

          Can you please try to reproduce with the above steps, we can reach to a conclusion fast. Let me know, if not reproducing I will reproduce and share the logs

          Show
          arshad.mohammad Mohammad Arshad added a comment - The existence of .next file indicates that there was a failure in the middle of reconfig, and the commitandactivate message didn't arrive to the server on which you find this file. Failure may be one of scenarios where .next file is not deleted but not the only scenario. In this scenario .next file exists because of logical problem in the code. commitandactivate message arrives but it is not getting process because of bellow reason: sequence of events: 1) case Leader.NEWLEADER: lastSeenQuorumVerifier is updated with 100000000 and .next file created. 2) case Leader.PROPOSAL(reconfig): lastSeenQuorumVerifier is updated with 200000000 and earlier .next file overridden 3) case Leader.COMMITANDACTIVATE: Because snapshot in taken in step 1 snapshotTaken=true and self.processReconfig(); is not called. This call was supposed to delete .next file and create the updated zoo.cfg.dynamic.200000000 file code reference: if (!snapshotTaken) { ---- boolean majorChange = self.processReconfig(qv, ByteBuffer.wrap(qp.getData()).getLong(), qp.getZxid(), true ); ---- } 4) case Leader.UPTODATE: this calls self.processReconfig but again it is skipped because the lastSeenQuorumVerifier version is higher. it got updated in 2) public synchronized QuorumVerifier setQuorumVerifier(QuorumVerifier qv, boolean writeToDisk){ if ((quorumVerifier != null ) && (quorumVerifier.getVersion() >= qv.getVersion())) { Which server was it ? just 4 or all of them ? just 4 is actually expected if server 4 crashed before it got the commitandactivate message. I described this in the manual: But the sever did not crash. It is in normal flow Could you please provide the logs ? Can you please try to reproduce with the above steps, we can reach to a conclusion fast. Let me know, if not reproducing I will reproduce and share the logs
          Hide
          shralex Alexander Shraer added a comment -

          Hi Arshad, your patch makes sense to me (as you say - processReconfig should be called and it wasn't when the snapshot flag was on),
          but we probably shouldn't commit it without a test. Since you have a clear scenario when this fails without the patch, would you be able to add a test ?
          I'm away until Aug 8 but can review when I'm back.

          Show
          shralex Alexander Shraer added a comment - Hi Arshad, your patch makes sense to me (as you say - processReconfig should be called and it wasn't when the snapshot flag was on), but we probably shouldn't commit it without a test. Since you have a clear scenario when this fails without the patch, would you be able to add a test ? I'm away until Aug 8 but can review when I'm back.
          Hide
          arshad.mohammad Mohammad Arshad added a comment -

          Alexander Shraer, Please find new patch ZOOKEEPER-2172-03.patch, which includes test case as well.

          Show
          arshad.mohammad Mohammad Arshad added a comment - Alexander Shraer , Please find new patch ZOOKEEPER-2172 -03.patch, which includes test case as well.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12822094/ZOOKEEPER-2172-03.patch
          against trunk revision 1755100.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3323//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3323//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3323//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12822094/ZOOKEEPER-2172-03.patch against trunk revision 1755100. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3323//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3323//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3323//console This message is automatically generated.
          Hide
          shralex Alexander Shraer added a comment -

          Arshad Mohammad, thanks for adding the test!

          Does this test consistently fail without your fix ? the reason I'm asking is that you pointed out previously that the issue occurs when a large snapshot is being taken, whereas in the test I don't see that the state is large. How do you make sure that COMMITANDACTIVATE is received while snapshotting ?

          Show
          shralex Alexander Shraer added a comment - Arshad Mohammad, thanks for adding the test! Does this test consistently fail without your fix ? the reason I'm asking is that you pointed out previously that the issue occurs when a large snapshot is being taken, whereas in the test I don't see that the state is large. How do you make sure that COMMITANDACTIVATE is received while snapshotting ?
          Hide
          arshad.mohammad Mohammad Arshad added a comment -
          1. Without fix, I just run 10 times, 9 times it failed one time it passed.
            After fix: 10 times passed out of 10 run
            At first place, covering this reconfig scenario through test cases is very difficult.
            This test case has been added to give reviewer a confidence in the fix. Hope the test case serves the purpose.
          2. How do you make sure that COMMITANDACTIVATE is received while snapshotting ?

            Not while snapshotting, but in between snapshotting and updtodate message.
            Before sending Leader.ACK sleep is added which gives a bigger time window for reconfig PROPOSAL and COMMITANDACTIVATE arrival
            and finally packet arrival seqence becomes Leader.NEWLEADER(snapshotting done here) --> reconfig Leader.PROPOSAL --> reconfig Leader.COMMITANDACTIVATE --> Leader.UPTODATE

          Show
          arshad.mohammad Mohammad Arshad added a comment - Without fix, I just run 10 times, 9 times it failed one time it passed. After fix: 10 times passed out of 10 run At first place, covering this reconfig scenario through test cases is very difficult. This test case has been added to give reviewer a confidence in the fix. Hope the test case serves the purpose. How do you make sure that COMMITANDACTIVATE is received while snapshotting ? Not while snapshotting, but in between snapshotting and updtodate message. Before sending Leader.ACK sleep is added which gives a bigger time window for reconfig PROPOSAL and COMMITANDACTIVATE arrival and finally packet arrival seqence becomes Leader.NEWLEADER(snapshotting done here) --> reconfig Leader.PROPOSAL --> reconfig Leader.COMMITANDACTIVATE --> Leader.UPTODATE
          Hide
          shralex Alexander Shraer added a comment -

          Thanks Arshad, I verified that the test fails before the patch and passes afterwards. I realize that the patch is complex, and appreciate you writing it! I was trying to understand the logic.

          A few minor comments:

          • Please rename the test so that the name reflects what is being tested. Such as ReconfigDuringLeaderSync / ReconfigDuringSnapshot or something similar.
          • Please don't assume that SERVER_COUNT = 3 (so the extra server can be number 4). You could use a different variable in the test instead of SERVER_COUNT, or just give the new server an id based on SERVER_COUNT instead of 4.
          • The test shuts down the 4th server in one method and the other three in another. Consider doing this in the same place and also closing the client handles.
          • getQP could be static, maybe rename to getCustomQuorumPeer or something like that
          • Please add more comments explaining the logic of the test. For example, in writePacket please add a comment that you're delaying the ACK message a follower sends as response to a NEWLEADER message, so that the leader has a chance to send the reconfig and only then the UPTODATE (basically your comment in this thread above). Without the comment its not clear why you're checking for ACK while setting the newLeaderMessage flag. (I hope I understood the logic correctly). Similarly, when the 4th server is being started, please add a comment saying that this server will delay the response to a NEWLEADER message. Basically, more comments would be helpful
          • In the code, shouldn't the condition of the if be "pif.hdr.getZxid() == qp.getZxid() && qp.getType() == Leader.COMMITANDACTIVATE" ?
            otherwise the logic inside the if may not be correct (the configuration info qv is extracted from pif).

          Maybe change to:

          if (pif.hdr.getZxid() != qp.getZxid())

          { LOG.warn("Committing " + qp.getZxid() + ", but next proposal is " + pif.hdr.getZxid()); }

          else if (qp.getType() == Leader.COMMITANDACTIVATE) {

          Show
          shralex Alexander Shraer added a comment - Thanks Arshad, I verified that the test fails before the patch and passes afterwards. I realize that the patch is complex, and appreciate you writing it! I was trying to understand the logic. A few minor comments: Please rename the test so that the name reflects what is being tested. Such as ReconfigDuringLeaderSync / ReconfigDuringSnapshot or something similar. Please don't assume that SERVER_COUNT = 3 (so the extra server can be number 4). You could use a different variable in the test instead of SERVER_COUNT, or just give the new server an id based on SERVER_COUNT instead of 4. The test shuts down the 4th server in one method and the other three in another. Consider doing this in the same place and also closing the client handles. getQP could be static, maybe rename to getCustomQuorumPeer or something like that Please add more comments explaining the logic of the test. For example, in writePacket please add a comment that you're delaying the ACK message a follower sends as response to a NEWLEADER message, so that the leader has a chance to send the reconfig and only then the UPTODATE (basically your comment in this thread above). Without the comment its not clear why you're checking for ACK while setting the newLeaderMessage flag. (I hope I understood the logic correctly). Similarly, when the 4th server is being started, please add a comment saying that this server will delay the response to a NEWLEADER message. Basically, more comments would be helpful In the code, shouldn't the condition of the if be "pif.hdr.getZxid() == qp.getZxid() && qp.getType() == Leader.COMMITANDACTIVATE" ? otherwise the logic inside the if may not be correct (the configuration info qv is extracted from pif). Maybe change to: if (pif.hdr.getZxid() != qp.getZxid()) { LOG.warn("Committing " + qp.getZxid() + ", but next proposal is " + pif.hdr.getZxid()); } else if (qp.getType() == Leader.COMMITANDACTIVATE) {
          Hide
          shralex Alexander Shraer added a comment -

          another thing you could add is to check that the quorumverifier was updated e.g., when you start the fourth server do that with only itself and the leader
          and then at the end of the test check that its quorum verifier includes all 4 servers. Checking the file deletion is ok, but its sort of a symptom of the fact
          that reconfiguration didn't complete on that server, so this is an attempt to check the configuration explicitly.

          Show
          shralex Alexander Shraer added a comment - another thing you could add is to check that the quorumverifier was updated e.g., when you start the fourth server do that with only itself and the leader and then at the end of the test check that its quorum verifier includes all 4 servers. Checking the file deletion is ok, but its sort of a symptom of the fact that reconfiguration didn't complete on that server, so this is an attempt to check the configuration explicitly.
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12823593/ZOOKEEPER-2172-04.patch
          against trunk revision 1756262.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3363//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3363//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3363//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12823593/ZOOKEEPER-2172-04.patch against trunk revision 1756262. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3363//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3363//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3363//console This message is automatically generated.
          Hide
          arshad.mohammad Mohammad Arshad added a comment -

          Thanks Alexander Shraer, addressed all the comments in the latest patch.

          Show
          arshad.mohammad Mohammad Arshad added a comment - Thanks Alexander Shraer , addressed all the comments in the latest patch.
          Hide
          shralex Alexander Shraer added a comment -

          Thanks!
          I made a few more changes:

          • starting the new server with only itself and the leader in the inittial config, so we can know that its config was really updated after reconfig
          • changed the Learner.java logic a bit, to match what was there before ZK-107 (see revision history). Also I noticed that if majorChange = true we often skip important logic when throwing an exception, so I moved the exception part to the end of each code block.

          Can another committer take a look before we commit ? Chris Nauroth Raul Gutierrez Segales Flavio Junqueira

          Show
          shralex Alexander Shraer added a comment - Thanks! I made a few more changes: starting the new server with only itself and the leader in the inittial config, so we can know that its config was really updated after reconfig changed the Learner.java logic a bit, to match what was there before ZK-107 (see revision history). Also I noticed that if majorChange = true we often skip important logic when throwing an exception, so I moved the exception part to the end of each code block. Can another committer take a look before we commit ? Chris Nauroth Raul Gutierrez Segales Flavio Junqueira
          Hide
          shralex Alexander Shraer added a comment -

          Mohammad Arshad, please take a look

          Show
          shralex Alexander Shraer added a comment - Mohammad Arshad , please take a look
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12823597/ZOOKEPER-2172-05.patch
          against trunk revision 1756262.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3365//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3365//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3365//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12823597/ZOOKEPER-2172-05.patch against trunk revision 1756262. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3365//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3365//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3365//console This message is automatically generated.
          Hide
          shralex Alexander Shraer added a comment -

          On second thought, investigating the right way to throw the exception should probably be done in a separate jira, so I reverted some of my changes.

          Show
          shralex Alexander Shraer added a comment - On second thought, investigating the right way to throw the exception should probably be done in a separate jira, so I reverted some of my changes.
          Hide
          shralex Alexander Shraer added a comment -
          Show
          shralex Alexander Shraer added a comment - I opened https://issues.apache.org/jira/browse/ZOOKEEPER-2513 as a followup
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12823602/ZOOKEEPER-2172-06.patch
          against trunk revision 1756262.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3366//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3366//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3366//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12823602/ZOOKEEPER-2172-06.patch against trunk revision 1756262. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3366//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3366//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3366//console This message is automatically generated.
          Hide
          arshad.mohammad Mohammad Arshad added a comment -

          Thanks Alexander Shraer for authoring the patch, patch is improved a lot.
          It is good that the changes you introduced in ZOOKEEPER-2172-05.patch have removed in the latest patch and created new jira to address those concerns.
          I corrected following nits in the ZOOKEEPER-2172-06.patch and submitted new patch

          ZOOKEEPER-2172-06.patch:10: trailing whitespace.
          ZOOKEEPER-2172-06.patch:30: trailing whitespace.
                                  boolean majorChange =
          ZOOKEEPER-2172-06.patch:31: space before tab in indent.
                                                  self.processReconfig(qv, ByteBuffer.wrap(qp.getData()).getLong(), qp.getZxid(), true);
          ZOOKEEPER-2172-06.patch:35: trailing whitespace.
                              }
          ZOOKEEPER-2172-06.patch:111: trailing whitespace.
               *
          
          Show
          arshad.mohammad Mohammad Arshad added a comment - Thanks Alexander Shraer for authoring the patch, patch is improved a lot. It is good that the changes you introduced in ZOOKEEPER-2172 -05.patch have removed in the latest patch and created new jira to address those concerns. I corrected following nits in the ZOOKEEPER-2172 -06.patch and submitted new patch ZOOKEEPER-2172-06.patch:10: trailing whitespace. ZOOKEEPER-2172-06.patch:30: trailing whitespace. boolean majorChange = ZOOKEEPER-2172-06.patch:31: space before tab in indent. self.processReconfig(qv, ByteBuffer.wrap(qp.getData()).getLong(), qp.getZxid(), true); ZOOKEEPER-2172-06.patch:35: trailing whitespace. } ZOOKEEPER-2172-06.patch:111: trailing whitespace. *
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12823615/ZOOKEEPER-2172-07.patch
          against trunk revision 1756262.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 2 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3368//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3368//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3368//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12823615/ZOOKEEPER-2172-07.patch against trunk revision 1756262. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3368//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3368//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/3368//console This message is automatically generated.
          Hide
          shralex Alexander Shraer added a comment -

          +1 from me. Since I removed most my changes, I feel comfortable committing this but lets wait a week or so to let
          other people comment if they wish.

          Mohammad Arshad would you like to take a stab at ZOOKEEPER-2513 ?

          Show
          shralex Alexander Shraer added a comment - +1 from me. Since I removed most my changes, I feel comfortable committing this but lets wait a week or so to let other people comment if they wish. Mohammad Arshad would you like to take a stab at ZOOKEEPER-2513 ?
          Hide
          phunt Patrick Hunt added a comment -

          Given this has signoff by Alex and it's been pending for a while I've committed it to 3.5.3 and trunk. Thanks Arshad (and Alex)!

          Show
          phunt Patrick Hunt added a comment - Given this has signoff by Alex and it's been pending for a while I've committed it to 3.5.3 and trunk. Thanks Arshad (and Alex)!
          Hide
          hudson Hudson added a comment -

          FAILURE: Integrated in Jenkins build ZooKeeper-trunk #3070 (See https://builds.apache.org/job/ZooKeeper-trunk/3070/)
          ZOOKEEPER-2172: Cluster crashes when reconfig a new node as a participant (Arshad Mohammad via phunt) (phunt: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1759907)

          • (edit) trunk/CHANGES.txt
          • (edit) trunk/src/java/main/org/apache/zookeeper/server/quorum/Learner.java
          • (add) trunk/src/java/test/org/apache/zookeeper/server/quorum/ReconfigDuringLeaderSyncTest.java
          Show
          hudson Hudson added a comment - FAILURE: Integrated in Jenkins build ZooKeeper-trunk #3070 (See https://builds.apache.org/job/ZooKeeper-trunk/3070/ ) ZOOKEEPER-2172 : Cluster crashes when reconfig a new node as a participant (Arshad Mohammad via phunt) (phunt: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1759907 ) (edit) trunk/CHANGES.txt (edit) trunk/src/java/main/org/apache/zookeeper/server/quorum/Learner.java (add) trunk/src/java/test/org/apache/zookeeper/server/quorum/ReconfigDuringLeaderSyncTest.java
          Hide
          arshad.mohammad Mohammad Arshad added a comment -

          Thanks Bhupendra Kumar Jain, Ajith S for helping me in analysing this issue.
          Thanks Alexander Shraer,Patrick Hunt for reviewing and committing the patch.
          Thanks every one for participating in the discussion and providing useful information.

          Show
          arshad.mohammad Mohammad Arshad added a comment - Thanks Bhupendra Kumar Jain , Ajith S for helping me in analysing this issue. Thanks Alexander Shraer , Patrick Hunt for reviewing and committing the patch. Thanks every one for participating in the discussion and providing useful information.

            People

            • Assignee:
              arshad.mohammad Mohammad Arshad
              Reporter:
              ziyouw Ziyou Wang
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development