ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-107

Allow dynamic changes to server cluster membership

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.5.0
    • Component/s: server
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currently cluster membership is statically defined, adding/removing hosts to/from the server cluster dynamically needs to be supported.

      1. ZOOKEEPER-107.patch
        2 kB
        Michi Mutsuzaki
      2. ZOOKEEPER-107.patch
        0.6 kB
        Michi Mutsuzaki
      3. ZOOKEEPER-107.patch
        0.6 kB
        Michi Mutsuzaki
      4. ZOOKEEPER-107.patch
        2 kB
        Michi Mutsuzaki
      5. ZOOKEEPER-107-5-Mar-ver2.patch
        338 kB
        Alexander Shraer
      6. ZOOKEEPER-107-5-Mar.patch
        337 kB
        Alexander Shraer
      7. ZOOKEEPER-107-2-Mar.patch
        358 kB
        Alexander Shraer
      8. ZOOKEEPER-107.patch
        356 kB
        Michi Mutsuzaki
      9. ZOOKEEPER-107-4-Feb-ver2.patch
        340 kB
        Alexander Shraer
      10. ZOOKEEPER-107-4-Feb-ver2.patch
        340 kB
        Michi Mutsuzaki
      11. ZOOKEEPER-107-4-Feb-ver1.patch
        328 kB
        Alexander Shraer
      12. ZOOKEEPER-107-4-Feb.patch
        328 kB
        Alexander Shraer
      13. ZOOKEEPER-107-24-Jan.patch
        315 kB
        Alexander Shraer
      14. ZOOKEEPER-107-20-Jan.patch
        312 kB
        Alexander Shraer
      15. ZOOKEEPER-107-18-Jan-ver2.patch
        312 kB
        Alexander Shraer
      16. ZOOKEEPER-107-18-Jan.patch
        312 kB
        Alexander Shraer
      17. ZOOKEEPER-107-17-Jan.patch
        308 kB
        Alexander Shraer
      18. ZOOKEEPER-107-16-Jan.patch
        280 kB
        Alexander Shraer
      19. ZOOKEEPER-107-14-Jan.patch
        279 kB
        Alexander Shraer
      20. ZOOKEEPER-107-28-NOV-ver2.patch
        311 kB
        Alexander Shraer
      21. ZOOKEEPER-107-7-NOV-ver2.patch
        217 kB
        Alexander Shraer
      22. ZOOKEEPER-107-7-NOV-ver1.patch
        216 kB
        Alexander Shraer
      23. ZOOKEEPER-107-7-NOV.patch
        216 kB
        Alexander Shraer
      24. ZOOKEEPER-107-6-NOV-2.patch
        216 kB
        Alexander Shraer
      25. ZOOKEEPER-107-15-Oct-ver3.patch
        217 kB
        Marshall McMullen
      26. ZOOKEEPER-107-15-Oct-ver2.patch
        217 kB
        Marshall McMullen
      27. ZOOKEEPER-107-15-Oct-ver1.patch
        217 kB
        Marshall McMullen
      28. ZOOKEEPER-107-15-Oct.patch
        217 kB
        Marshall McMullen
      29. ZOOKEEPER-107-14-Oct.patch
        217 kB
        Marshall McMullen
      30. ZOOKEEPER-107-3-Oct.patch
        214 kB
        Alexander Shraer
      31. ZOOKEEPER-107-23-SEP.patch
        213 kB
        Alexander Shraer
      32. ZOOKEEPER-107-Aug-25.patch
        221 kB
        Alexander Shraer
      33. ZOOKEEPER-107-Aug-20-ver1.patch
        229 kB
        Alexander Shraer
      34. ZOOKEEPER-107-Aug-20.patch
        229 kB
        Alexander Shraer
      35. ZOOKEEPER-107-21-July.patch
        210 kB
        Alexander Shraer
      36. ZOOKEEPER-107-20-July.patch
        271 kB
        Alexander Shraer
      37. zkreconfig-usenixatc-final.pdf
        532 kB
        Alexander Shraer
      38. ZOOKEEPER-107-22-Apr.patch
        178 kB
        Alexander Shraer
      39. ZOOKEEPER-107-1-Mar.patch
        264 kB
        Alexander Shraer
      40. ZOOKEEPER-107-29-Feb.patch
        262 kB
        Alexander Shraer
      41. ZOOKEEPER-107-28-Feb.patch
        260 kB
        Alexander Shraer
      42. ZOOKEEPER-107-28-Feb.patch
        260 kB
        Alexander Shraer
      43. zookeeper-reconfig-sep12.patch
        275 kB
        Alexander Shraer
      44. zookeeper-dev-fatjar.jar
        3.31 MB
        Alexander Shraer
      45. zookeeper-3.4.0.jar
        1.26 MB
        Alexander Shraer
      46. zoo_replicated1.members
        0.1 kB
        Alexander Shraer
      47. zoo_replicated1.cfg
        0.2 kB
        Alexander Shraer
      48. zookeeper-reconfig-sep11.patch
        276 kB
        Alexander Shraer
      49. SimpleAddition.rtf
        6 kB
        Raghu S

        Issue Links

          Activity

          Hide
          Patrick Hunt added a comment -

          Submitted to me by a user. This describes a change both for servers and for clients. Currently servers
          share a configuration file that statically defines the cluster members (servers). Additionally clients are statically
          configured with a list of accessible servers.

          "Instead of every client
          maintaining a list of zookeeper servers, the servers should maintain
          that info (e.g in a special 'node') and handle updates via the
          server-to-server protcol. Then the client just needs to know the
          server:port of one zookeeper server (or a bunch of 'forwarding
          zookeepers' for redundancy) that it talks to and the servers take it
          from there.

          If one server gets added to the collective, the server-to-server
          protocol should propagate it among all servers and all servers update
          their maps. Same if a zookeeper server gets moved out of rotation, there
          should be an internal protocol to handle this and have all servers
          update their maps. "

          Show
          Patrick Hunt added a comment - Submitted to me by a user. This describes a change both for servers and for clients. Currently servers share a configuration file that statically defines the cluster members (servers). Additionally clients are statically configured with a list of accessible servers. "Instead of every client maintaining a list of zookeeper servers, the servers should maintain that info (e.g in a special 'node') and handle updates via the server-to-server protcol. Then the client just needs to know the server:port of one zookeeper server (or a bunch of 'forwarding zookeepers' for redundancy) that it talks to and the servers take it from there. If one server gets added to the collective, the server-to-server protocol should propagate it among all servers and all servers update their maps. Same if a zookeeper server gets moved out of rotation, there should be an internal protocol to handle this and have all servers update their maps. "
          Hide
          Jakob Homan added a comment - - edited

          I thought I had along this line was to create an additional constructor that takes a URL rather than string for the host. The constructor could then access that URL and get the list of servers from there. So, assuming the URL pointed to an http page, the page returned would just be "hostA:port,hostB:port,etc" and the client could proceed with that information. Or, the URL could point to a local file if the server membership wasn't expected to change often. This would eliminate the need for the clients to have any idea ahead of time of where to get the hosts. Particularly if the information were served via http, this would move server location to being a DNS/virtualhost problem - the DNS for the server-info location could be changed if the server providing the info died. This would go a ways toward adding a restful interface to the configuration of the cluster.

          So, the client would instantiate with
          zk = new Zookeeper(new URL("http://zkhostinfo:7552"), 1000, this);,
          receive the list of hosts and work from there.

          This would address (though not solve) the issue of adding servers, as new servers could be added to the list returned to new clients.

          As a corollary, the zk servers could of course serve the list of hosts as an optional part of their operation, or this function could be provided by another application, directory-style. This would allow clients to connect to different clusters on start-up, if the directory could identify and differentiate clients by ip addr or such, and direct them to the appropriate zk group.

          If this sounds (reasonable && interesting), I can open a JIRA and work on a patch to add the new constructor and client functionality.

          Show
          Jakob Homan added a comment - - edited I thought I had along this line was to create an additional constructor that takes a URL rather than string for the host. The constructor could then access that URL and get the list of servers from there. So, assuming the URL pointed to an http page, the page returned would just be "hostA:port,hostB:port,etc" and the client could proceed with that information. Or, the URL could point to a local file if the server membership wasn't expected to change often. This would eliminate the need for the clients to have any idea ahead of time of where to get the hosts. Particularly if the information were served via http, this would move server location to being a DNS/virtualhost problem - the DNS for the server-info location could be changed if the server providing the info died. This would go a ways toward adding a restful interface to the configuration of the cluster. So, the client would instantiate with zk = new Zookeeper(new URL("http://zkhostinfo:7552"), 1000, this);, receive the list of hosts and work from there. This would address (though not solve) the issue of adding servers, as new servers could be added to the list returned to new clients. As a corollary, the zk servers could of course serve the list of hosts as an optional part of their operation, or this function could be provided by another application, directory-style. This would allow clients to connect to different clusters on start-up, if the directory could identify and differentiate clients by ip addr or such, and direct them to the appropriate zk group. If this sounds (reasonable && interesting), I can open a JIRA and work on a patch to add the new constructor and client functionality.
          Hide
          Benjamin Reed added a comment -

          +1 I like the idea. You can currently use DNS for this functionality: make zookeeper.acme.com resolve to 5 different IP addresses and then specify new ZooKeeper("zookeeper.acme.com:3233", 1000, this), but DNS is hard to modify. A replicate webserver would be much easier to update.

          Show
          Benjamin Reed added a comment - +1 I like the idea. You can currently use DNS for this functionality: make zookeeper.acme.com resolve to 5 different IP addresses and then specify new ZooKeeper("zookeeper.acme.com:3233", 1000, this), but DNS is hard to modify. A replicate webserver would be much easier to update.
          Hide
          Patrick Hunt added a comment -

          Obviously it would be great if we supported reading from a ZooKeeper cluster!

          This just reminded me of another comment I got recently on this. The suggestion was to use a URI (similar to jdbc for example) rather than a host/port list.

          Perhaps we should have some sort of plugin architecture here, where the uri would be provided and each registered plugin would map the host/port mapping based on the scheme.

          Show
          Patrick Hunt added a comment - Obviously it would be great if we supported reading from a ZooKeeper cluster! This just reminded me of another comment I got recently on this. The suggestion was to use a URI (similar to jdbc for example) rather than a host/port list. Perhaps we should have some sort of plugin architecture here, where the uri would be provided and each registered plugin would map the host/port mapping based on the scheme.
          Hide
          Hiram Chirino added a comment -

          I personally think that this needs to stay decoupled so that group membership can be controlled via different implementations. In other words, I think that the QuorumPeer should not have to have any constructor args for it to know it's peers. It should persistently store/remember what the list of peers are part of the group since it last started.

          Not sure if it makes sense to keep that list in the ZK db or not.

          When a node that is not part of a cluster first starts up, it needs to know if it's starting a new cluster or if it is joining an existing cluster. Therefore, I think the QuorumPeer class needs methods like the following:

          /** 
           * Contacts a ZK server in the cluster, adds this peer to the cluster and gets a listing of the rest of the peers in 
           * the cluster.
           *
           * Optional: is slaveOnly is true, then this peer should never be elected master.
           *
           * Throws an error if this peer is already part of a cluster.
           */ 
          void joinCluster( URI server, bool slaveOnly )
          
          /**
           * Starts this peer as the first node in the cluster and makes him the master.
           *
           * Throws an error if this peer is already part of a cluster.
           */
          void createCluster()
          
          /**
           * Removes this peer from the peer list maintained by the cluster.
           *
           * Throws an error if this peer is not part of a cluster.
           */
          void leaveCluster()
          
          /**
           * Gets a list of peers in the cluser.
           *
           * @return null if not part of a cluster yet.
           */
          List<URI> getClusterPeers()
          

          If methods like the above are available, then an administrator can dynamically manage adding/removing nodes on an existing ZooKeeper cluster. or some automated agent could do it. Note that the peer list needs to get replicated to all cluster members and persisted to avoid split brain issues on peer restart. Operations like joinCluster(), leaveCluster(), getClusterPeers() would block until a master is elected in the cluster.

          Please note the 'nice to have feature' where you have the ability to designate some peers as NOT being eligible to become a master. This would allow you to support using heterogeneous peers, and enforce only allowing the higher end machines to become the masters.

          Show
          Hiram Chirino added a comment - I personally think that this needs to stay decoupled so that group membership can be controlled via different implementations. In other words, I think that the QuorumPeer should not have to have any constructor args for it to know it's peers. It should persistently store/remember what the list of peers are part of the group since it last started. Not sure if it makes sense to keep that list in the ZK db or not. When a node that is not part of a cluster first starts up, it needs to know if it's starting a new cluster or if it is joining an existing cluster. Therefore, I think the QuorumPeer class needs methods like the following: /** * Contacts a ZK server in the cluster, adds this peer to the cluster and gets a listing of the rest of the peers in * the cluster. * * Optional: is slaveOnly is true , then this peer should never be elected master. * * Throws an error if this peer is already part of a cluster. */ void joinCluster( URI server, bool slaveOnly ) /** * Starts this peer as the first node in the cluster and makes him the master. * * Throws an error if this peer is already part of a cluster. */ void createCluster() /** * Removes this peer from the peer list maintained by the cluster. * * Throws an error if this peer is not part of a cluster. */ void leaveCluster() /** * Gets a list of peers in the cluser. * * @ return null if not part of a cluster yet. */ List<URI> getClusterPeers() If methods like the above are available, then an administrator can dynamically manage adding/removing nodes on an existing ZooKeeper cluster. or some automated agent could do it. Note that the peer list needs to get replicated to all cluster members and persisted to avoid split brain issues on peer restart. Operations like joinCluster(), leaveCluster(), getClusterPeers() would block until a master is elected in the cluster. Please note the 'nice to have feature' where you have the ability to designate some peers as NOT being eligible to become a master. This would allow you to support using heterogeneous peers, and enforce only allowing the higher end machines to become the masters.
          Hide
          Benjamin Reed added a comment -

          I think there are two issues here: 1) adding/removing servers to a ZooKeeper cluster and 2) letting clients know about the change. We should probably separate them. I like the URL idea for dealing with 1) (especially when used in conjunction with the other idea in this Jira of defining a URL scheme for ZooKeeper). For 2) I agree with Hiram that it should be stored persistently at each replica and changed via the replication protocol.

          Show
          Benjamin Reed added a comment - I think there are two issues here: 1) adding/removing servers to a ZooKeeper cluster and 2) letting clients know about the change. We should probably separate them. I like the URL idea for dealing with 1) (especially when used in conjunction with the other idea in this Jira of defining a URL scheme for ZooKeeper). For 2) I agree with Hiram that it should be stored persistently at each replica and changed via the replication protocol.
          Hide
          Mahadev konar added a comment -

          +1 for using URI's on the client side to get a list of zookeeper servers . We can always update the zookeeper client periodically by fetching from the URI ....

          Show
          Mahadev konar added a comment - +1 for using URI's on the client side to get a list of zookeeper servers . We can always update the zookeeper client periodically by fetching from the URI ....
          Hide
          Patrick Hunt added a comment -

          In my comment "URI rather than a host/port list" I was specifically referring to the client's host/port list used to specify the servers to which the client should connect. Probably a good idea to use something like this on the servers as well.

          Regarding the idea of join/leave a cluster, this sounds good. How does this mesh with the common case of starting up a set of 5 servers forming a new cluster? Specifically the idea of operations blocking (hiram's comment) until master is elected. Not sure I see how this works...

          Show
          Patrick Hunt added a comment - In my comment "URI rather than a host/port list" I was specifically referring to the client's host/port list used to specify the servers to which the client should connect. Probably a good idea to use something like this on the servers as well. Regarding the idea of join/leave a cluster, this sounds good. How does this mesh with the common case of starting up a set of 5 servers forming a new cluster? Specifically the idea of operations blocking (hiram's comment) until master is elected. Not sure I see how this works...
          Hide
          Henry Robinson added a comment -

          This is something I'd be willing to work on.

          Just to sum up my current understanding of the requirements:

          1. Must support off-cluster getPeers operation for a recovering peer to bootstrap itself (can cache in its own persistent storage, but that could potentially be out of date by recovery time). This is probably best realised with the URI idea as before.

          2. Support for join and leave operation. With a quiescent cluster, join is probably as simple as a sync followed by a commit of the new peer's id to all followers (if nothing else, this ensures that if one of them should be elected the master, they know how big the quorum should be). Leaves are similar, without the sync obviously. If a peer leaves before the Leave( ) operation completes, it will look like a crash.

          3. If joining / leaving a cluster that doesn't have a currently elected master, block until one exists. If the cluster is currently failed due to f+1 failures, it might be necessary to timeout in order to prevent being permanently blocked if this is in the middle of a code path.

          4. However, if joining / leaving a cluster that has never bootstrapped it's important to do something different so as to allow the cluster to achieve a quorum. One solution is for a node to check if its id is in the list of peers at the cluster URI which will tell it if it was ever a member of the cluster previously (or part of the initial membership) and then participate in master elections. This places a requirement on the peer list to be kept reasonably accurate (but this could only affect liveness, not safety, I think).

          Please chime in with comments / stuff that I've missed / bugs, otherwise I'll work on fleshing this out.

          Show
          Henry Robinson added a comment - This is something I'd be willing to work on. Just to sum up my current understanding of the requirements: 1. Must support off-cluster getPeers operation for a recovering peer to bootstrap itself (can cache in its own persistent storage, but that could potentially be out of date by recovery time). This is probably best realised with the URI idea as before. 2. Support for join and leave operation. With a quiescent cluster, join is probably as simple as a sync followed by a commit of the new peer's id to all followers (if nothing else, this ensures that if one of them should be elected the master, they know how big the quorum should be). Leaves are similar, without the sync obviously. If a peer leaves before the Leave( ) operation completes, it will look like a crash. 3. If joining / leaving a cluster that doesn't have a currently elected master, block until one exists. If the cluster is currently failed due to f+1 failures, it might be necessary to timeout in order to prevent being permanently blocked if this is in the middle of a code path. 4. However, if joining / leaving a cluster that has never bootstrapped it's important to do something different so as to allow the cluster to achieve a quorum. One solution is for a node to check if its id is in the list of peers at the cluster URI which will tell it if it was ever a member of the cluster previously (or part of the initial membership) and then participate in master elections. This places a requirement on the peer list to be kept reasonably accurate (but this could only affect liveness, not safety, I think). Please chime in with comments / stuff that I've missed / bugs, otherwise I'll work on fleshing this out.
          Hide
          Benjamin Reed added a comment -

          sounds great henry! it would be great if you could work on this.

          i think we have two strategies:

          1) have the cluster agree on a list of servers and use the atomic broadcast to agree on changes. (this might be a bit more difficult with the flexible quorum configuration. right flavio?) this is mostly in line with your first three points. btw, i don't think you need to quiesce for this or even do the sync. i think you can do a conditional update.

          2) use some external resource file indicated by a URL to define the machines that make up a cluster. this is in line with your last point and you hint at this with your first point.

          i think the first approach is safer and more reliable. the second is easier to implement and easier to see what is going on, but i during transition time you have a problem as the resource file propagates through the cluster. (you could have different members with different views.)

          the thing i was thinking of for the first option is exposing the cluster config through a znode '/.zookeeper/ensemble' or something like that. then changing the configuration would be as "simple" as conditionally setting a new version of that file. the tricky part is that you could only commit the change if you have a quorum of followers in both the old and the new configuration. this seems to be in line with what you are thinking correct?

          Show
          Benjamin Reed added a comment - sounds great henry! it would be great if you could work on this. i think we have two strategies: 1) have the cluster agree on a list of servers and use the atomic broadcast to agree on changes. (this might be a bit more difficult with the flexible quorum configuration. right flavio?) this is mostly in line with your first three points. btw, i don't think you need to quiesce for this or even do the sync. i think you can do a conditional update. 2) use some external resource file indicated by a URL to define the machines that make up a cluster. this is in line with your last point and you hint at this with your first point. i think the first approach is safer and more reliable. the second is easier to implement and easier to see what is going on, but i during transition time you have a problem as the resource file propagates through the cluster. (you could have different members with different views.) the thing i was thinking of for the first option is exposing the cluster config through a znode '/.zookeeper/ensemble' or something like that. then changing the configuration would be as "simple" as conditionally setting a new version of that file. the tricky part is that you could only commit the change if you have a quorum of followers in both the old and the new configuration. this seems to be in line with what you are thinking correct?
          Hide
          Raghu S added a comment -

          Using a URL for obtaining a list of servers may not be acceptable in all environments. The URL will have to highly available otherwise the clients won't connect after a restart. It applies to ZK peers as well, a peer won't be able to restart if the URL is down at the time of peer restart. So I believe (1) is a better choice here – the cluster needs to be self-contained.

          IIUC, using a ZK node to expose the cluster config would mean that the cluster can't be created unless the cluster config is pre-populated? We need to first create a log file that contains a config node that has the initial cluster members, copy them to all the nodes and power them on to create the cluster? Once the cluster is formed, new nodes can be admitted to the cluster by having a quorum commit the new server name in the cluster config.

          Another alternative would be to have a single node come up first (with the log file pre-populated to contain a single server in the cluster config node) and then have all other servers join one at a time? This sounds a bit weird to me in that the first node will have to form a single node cluster in the beginning, which is an oxymoron

          Show
          Raghu S added a comment - Using a URL for obtaining a list of servers may not be acceptable in all environments. The URL will have to highly available otherwise the clients won't connect after a restart. It applies to ZK peers as well, a peer won't be able to restart if the URL is down at the time of peer restart. So I believe (1) is a better choice here – the cluster needs to be self-contained. IIUC, using a ZK node to expose the cluster config would mean that the cluster can't be created unless the cluster config is pre-populated? We need to first create a log file that contains a config node that has the initial cluster members, copy them to all the nodes and power them on to create the cluster? Once the cluster is formed, new nodes can be admitted to the cluster by having a quorum commit the new server name in the cluster config. Another alternative would be to have a single node come up first (with the log file pre-populated to contain a single server in the cluster config node) and then have all other servers join one at a time? This sounds a bit weird to me in that the first node will have to form a single node cluster in the beginning, which is an oxymoron
          Hide
          Benjamin Reed added a comment -

          i agree with everything you are saying and yes to all the questions. it's not as strange as it sounds. today we have to pre-populate the cluster config. it would just be that now rather than creating a file with vi we would need to use a utility to create an initial snapshot that has the config in it. i think this would also help with some deployment errors by tightly tying the data with the cluster config. the previously mentioned utility would also allow you to avoid having to start with a single node cluster and growing from there.

          Show
          Benjamin Reed added a comment - i agree with everything you are saying and yes to all the questions. it's not as strange as it sounds. today we have to pre-populate the cluster config. it would just be that now rather than creating a file with vi we would need to use a utility to create an initial snapshot that has the config in it. i think this would also help with some deployment errors by tightly tying the data with the cluster config. the previously mentioned utility would also allow you to avoid having to start with a single node cluster and growing from there.
          Hide
          Mahadev konar added a comment -

          Henry,
          one thing I would like to point out is that please post a concrete proposal (since this invloves the core internals of zookeeper) before you start working on this, so that their is agreement and no wasted effort...

          Show
          Mahadev konar added a comment - Henry, one thing I would like to point out is that please post a concrete proposal (since this invloves the core internals of zookeeper) before you start working on this, so that their is agreement and no wasted effort...
          Hide
          Henry Robinson added a comment -

          I agree with pretty much everything I've read here, (in particular, the importance of getting consensus!), but wanted to clarify my initial comment a bit.

          Rather than choose between strategies 1 and 2 as outlined by Benjamin, I think there's a hybrid approach needed.

          If a node is a member of a quorate cluster, then the most up to date membership information should be available to it in a znode. I think this is the most elegant approach, and is trivially achieved by pushing join/leave requests through the atomic broadcast pipeline.

          If a node is joining the cluster, it needs to be able to bootstrap the location of the cluster from somewhere. There therefore needs to be a externally available resource containing a list of machines in the cluster that is at least accurate for one machine (as a joining node will try all servers in that list in turn). When I say available at some URI, this is what I mean. Currently, this information is kept statically at a URI that addresses conf/zoo.cfg on the local filesystem. I suggest generalising that to a general URI. One nice property is that it then does not tie a cluster to a particular machine, as the URI provides a level of indirection.

          It is then the cluster administrator's responsibility to keep this URI up-to-date (although of course this should be automated), possibly via a client that just pulls membership information from the cluster periodically. As I said earlier, it's only important for the contents of this list to have one node in common with the true membership of the cluster, so it's allowed to get a bit out of sync. We can certainly easily imagine ways that ZK can help here. Of course the URI must be highly available, but it also has to exist, otherwise we could have 'orphaned' clusters that are running on machines whose identity we don't necessarily know. The URI can be a front for almost any scheme we like - periodic heartbeating of live nodes is one.

          The format of this file can be anything at all - from a serialised snapshot to a list of ip:port pairs, as long as it contains enough information for a client to find the cluster. Personally I would prefer human readable, simple formats.

          To talk about recovery for a moment: when a node recovers from a crash and rejoins the cluster, it can help the cluster elect a master if the cluster is current non-quorate. This is because it was originally part of the cluster, and therefore the protocol guarantees that a quorum of nodes including the recovering one will have seen all committed proposals (this is important to correctness).

          If the node was not originally a member of the cluster, it must not help get a master elected as it cannot be part of a quorum. Similarly, a node cannot query the cluster to find out if it was originally a member because the quorum required to do so might not exist. Therefore every node that ever successfully joins a cluster must store this fact in its own persistent storage, as only it can know whether it is permitted to help run the election.

          Finally, the startup problem. Given a URI, nodes can bootstrap themselves onto a cluster simply by being told to start in startup mode. Alternatively, a single node can be distinguished (again, in the URI contents perhaps) which will start in single-node mode and process join requests one-by-one.

          Show
          Henry Robinson added a comment - I agree with pretty much everything I've read here, (in particular, the importance of getting consensus!), but wanted to clarify my initial comment a bit. Rather than choose between strategies 1 and 2 as outlined by Benjamin, I think there's a hybrid approach needed. If a node is a member of a quorate cluster, then the most up to date membership information should be available to it in a znode. I think this is the most elegant approach, and is trivially achieved by pushing join/leave requests through the atomic broadcast pipeline. If a node is joining the cluster, it needs to be able to bootstrap the location of the cluster from somewhere. There therefore needs to be a externally available resource containing a list of machines in the cluster that is at least accurate for one machine (as a joining node will try all servers in that list in turn). When I say available at some URI, this is what I mean. Currently, this information is kept statically at a URI that addresses conf/zoo.cfg on the local filesystem. I suggest generalising that to a general URI. One nice property is that it then does not tie a cluster to a particular machine, as the URI provides a level of indirection. It is then the cluster administrator's responsibility to keep this URI up-to-date (although of course this should be automated), possibly via a client that just pulls membership information from the cluster periodically. As I said earlier, it's only important for the contents of this list to have one node in common with the true membership of the cluster, so it's allowed to get a bit out of sync. We can certainly easily imagine ways that ZK can help here. Of course the URI must be highly available, but it also has to exist, otherwise we could have 'orphaned' clusters that are running on machines whose identity we don't necessarily know. The URI can be a front for almost any scheme we like - periodic heartbeating of live nodes is one. The format of this file can be anything at all - from a serialised snapshot to a list of ip:port pairs, as long as it contains enough information for a client to find the cluster. Personally I would prefer human readable, simple formats. To talk about recovery for a moment: when a node recovers from a crash and rejoins the cluster, it can help the cluster elect a master if the cluster is current non-quorate. This is because it was originally part of the cluster, and therefore the protocol guarantees that a quorum of nodes including the recovering one will have seen all committed proposals (this is important to correctness). If the node was not originally a member of the cluster, it must not help get a master elected as it cannot be part of a quorum. Similarly, a node cannot query the cluster to find out if it was originally a member because the quorum required to do so might not exist. Therefore every node that ever successfully joins a cluster must store this fact in its own persistent storage, as only it can know whether it is permitted to help run the election. Finally, the startup problem. Given a URI, nodes can bootstrap themselves onto a cluster simply by being told to start in startup mode. Alternatively, a single node can be distinguished (again, in the URI contents perhaps) which will start in single-node mode and process join requests one-by-one.
          Hide
          Benjamin Reed added a comment -

          the information needed for bootstrapping is the same as the information needed for a normal zookeeper client, so it could either use the standard string that is a list of host:port pairs, or it could use the scheme proposed in ZOOKEEPER-390. with that URL it could fetch /.zookeeper/ensemble and grab the configuration information that it needs. conf/zoo.cfg isn't really a good URI for this purpose since is doesn't really have the needed client ports. plus there is information in zoo.cfg that is particular to a given server. for example, the data and log directories may be different on all the machines. the client port should also probably stay in the zoo.cfg. the server lists and probably the timing variables should probably be stored in a znode and maintained with the atomic broadcast.

          recovery is a bit more than you mention, but at the same time simpler. first off, to change quorum configuration you must commit the change in both the old quorum configuration and in the new quorum configuration. for example, if you have the configuration A, B, C and you are changing to A, B, C, D, E you must be able to get quorum in both the old and new configuration for the change to work. if only A and B are up or A, D, and E are up you cannot commit the change. this means that the leader should check the new configuration carefully before proposing it, because we always roll the proposals forward, we never rollback.

          so really a zookeeper server doesn't know whether he is able to participate or not, the election will sort it out. a simple example is an ensemble A, B, C, D, E. E goes down. the last zxid it saw was <57,3>. while it is down the quorum configuration gets changed to A, B, C by <57,52>. lets say there is a leadership change and at <58,6> the power goes out and comes back on. E now tries to vote (it thinks it is permitted to participate), but it won't win any election since its zxid is too low. A, B, and C will ignore E's votes anyway because they know that E has been removed from the ensemble.

          Show
          Benjamin Reed added a comment - the information needed for bootstrapping is the same as the information needed for a normal zookeeper client, so it could either use the standard string that is a list of host:port pairs, or it could use the scheme proposed in ZOOKEEPER-390 . with that URL it could fetch /.zookeeper/ensemble and grab the configuration information that it needs. conf/zoo.cfg isn't really a good URI for this purpose since is doesn't really have the needed client ports. plus there is information in zoo.cfg that is particular to a given server. for example, the data and log directories may be different on all the machines. the client port should also probably stay in the zoo.cfg. the server lists and probably the timing variables should probably be stored in a znode and maintained with the atomic broadcast. recovery is a bit more than you mention, but at the same time simpler. first off, to change quorum configuration you must commit the change in both the old quorum configuration and in the new quorum configuration. for example, if you have the configuration A, B, C and you are changing to A, B, C, D, E you must be able to get quorum in both the old and new configuration for the change to work. if only A and B are up or A, D, and E are up you cannot commit the change. this means that the leader should check the new configuration carefully before proposing it, because we always roll the proposals forward, we never rollback. so really a zookeeper server doesn't know whether he is able to participate or not, the election will sort it out. a simple example is an ensemble A, B, C, D, E. E goes down. the last zxid it saw was <57,3>. while it is down the quorum configuration gets changed to A, B, C by <57,52>. lets say there is a leadership change and at <58,6> the power goes out and comes back on. E now tries to vote (it thinks it is permitted to participate), but it won't win any election since its zxid is too low. A, B, and C will ignore E's votes anyway because they know that E has been removed from the ensemble.
          Hide
          Raghu S added a comment -

          I think there are some corner cases that may make the leader election impossible during a node addition. Say the current config is A,B,C and the new config is A,B,C,D. When the leader is trying to commit the new configuration, the power goes out and comes back on when only A and B have logged the new configuration. Peer count in <A,B,C,D> = <4,4,3,3> now. An election is not possible if C is down because A and B think the majority is 3 peers and D can't participate in the election since it hasn't joined the cluster yet.

          It sounds like some out of band communication between an existing peer and a new peer is needed to make this thing work. If a peer restarts or notices quorum loss and if the last logged update is a node addition, the peer should try to contact the newly added server so that it can push it's log to the new peer (if the new peer doesn't already have an up to date log) and ask the new peer to restart. Until A or B do that in the above case, an election may not be possible.

          Show
          Raghu S added a comment - I think there are some corner cases that may make the leader election impossible during a node addition. Say the current config is A,B,C and the new config is A,B,C,D. When the leader is trying to commit the new configuration, the power goes out and comes back on when only A and B have logged the new configuration. Peer count in <A,B,C,D> = <4,4,3,3> now. An election is not possible if C is down because A and B think the majority is 3 peers and D can't participate in the election since it hasn't joined the cluster yet. It sounds like some out of band communication between an existing peer and a new peer is needed to make this thing work. If a peer restarts or notices quorum loss and if the last logged update is a node addition, the peer should try to contact the newly added server so that it can push it's log to the new peer (if the new peer doesn't already have an up to date log) and ask the new peer to restart. Until A or B do that in the above case, an election may not be possible.
          Hide
          Benjamin Reed added a comment -

          i don't think your example is a problem. it looks like transaction 4 is the one that added D. there is nothing that prevents D from participating in leader election. if C was up, it wouldn't nominate D, but A and B will, so either A or B will get elected and both recognize D as a valid member.

          while we are talking about corner cases, problems do come when you go the other way: you have A, B, C, and D, and you want to delete D, but C and D are down. you can get quorum in the new configuration with A and B, but you can't get quorum from the old configuration since C and D are down. for this you need manual intervention since there is no safe way (without risking future split brain) to remove D with C down.

          i think it radical cases like the last one, manual reconfiguration and bouncing the cluster is the best option. we do not want it to be easy since it risks having problems in the future.

          Show
          Benjamin Reed added a comment - i don't think your example is a problem. it looks like transaction 4 is the one that added D. there is nothing that prevents D from participating in leader election. if C was up, it wouldn't nominate D, but A and B will, so either A or B will get elected and both recognize D as a valid member. while we are talking about corner cases, problems do come when you go the other way: you have A, B, C, and D, and you want to delete D, but C and D are down. you can get quorum in the new configuration with A and B, but you can't get quorum from the old configuration since C and D are down. for this you need manual intervention since there is no safe way (without risking future split brain) to remove D with C down. i think it radical cases like the last one, manual reconfiguration and bouncing the cluster is the best option. we do not want it to be easy since it risks having problems in the future.
          Hide
          Raghu S added a comment -

          Folks, I have attached a high level write up on how we can implement dynamic configuration changes in a ZK ensemble. Please comment on this and let me know if you think this approach makes sense. Since I just came up with this, there could be corner cases that I may need to handle.

          Please note that this write-up doesn't focus on how we can make the configuration details available to the clients using a URI. I believe that can be dealt separately.

          Show
          Raghu S added a comment - Folks, I have attached a high level write up on how we can implement dynamic configuration changes in a ZK ensemble. Please comment on this and let me know if you think this approach makes sense. Since I just came up with this, there could be corner cases that I may need to handle. Please note that this write-up doesn't focus on how we can make the configuration details available to the clients using a URI. I believe that can be dealt separately.
          Hide
          Henry Robinson added a comment -

          Hi -

          Thanks for the proposal - it does a really good job of framing the important questions.

          I am in favour of a solution that uses ZAB and the existing consensus framework for dynamic group membership. I believe this can be achieved without an out-of-band protocol or significant changes to the way the current protocols work; this has the advantage of keeping things simple.

          I'm not certain I've read your proposal correctly, but it seems that step 6 has followers commit the CONFIGCHANGE proposal on receipt, rather than waiting for a COMMIT message. By my understanding of ZAB, this means there is a possibility where fewer than a quorum of followers will commit this proposal, if the leader fails halfway through sending the proposal messages, leading to the possibility of divergent histories at followers.

          The tool approach is one way of wrapping up the authentication required if an ensemble wishes to restrict those nodes that can join it. Currently there is some implicit authentication done as the leader only establishes connections with followers that belong to the static membership. However there's certainly a need, as a result of this JIRA, for a better authentication mechanism inside ZK. I see this as orthogonal to the mechanisms required to do dynamic membership.

          I suggest that we simply augment the current ZooKeeper protocol with four new proposals: NEWVIEW, GETVIEW, JOIN and LEAVE. NEWVIEW proposes an entirely new view, and may aggregate many JOIN or LEAVE proposals into one. Since NEWVIEW likely requires knowledge of the current view, GETVIEW returns the current view and its version. JOIN and LEAVE incrementally change the current view, whatever it is, and so do not require a GETVIEW call to establish the current view.

          All proposals go through the usual ZAB two-phase protocol, except for the fact that the leader coordinating the current ZAB instance must wait for acknowledgements from quorums in both the current and new view before committing the change.

          It's possible that this can lead to the proposal blocking if a quorum cannot be assembled in either view. Although it might seem an error that the proposal will block even if a quorum in the current view can be established, the same behaviour would be observed even if the proposal could be committed - all subsequent proposals would require a quorum from the new view and would block.

          If an ensemble is currently blocked due to the failure of n/2 + 1 nodes, it is not possible to resume progress by issuing a LEAVE on behalf of the failed nodes; however in general failed nodes may both JOIN and LEAVE the ensemble.

          If a leader election is required during a proposal, there are no correctness issues assuming the current required invariants of ZAB leader election hold. In particular, as long as the new leader has seen the most recent proposals then the view change proposal will be committed once the new leader is elected. This property will be maintained without changes to the current leader election protocols - as the view change proposal will have been seen by a quorum from the current view, the new leader is guaranteed to have a record of the proposal.

          A node that fails after it has issued a join proposal, but before it hears of its success must be able to find the status of the proposal once it recovers. There are several ways to do this.

          I have some sketches of correctness proofs for this and could produce a more detailed design document if required - however, if there's consensus that this is the right approach I'd rather get coding It turns out after much agonising that ZK's existent invariants are already pretty much strong enough to build this protocol. The only extension is the requirement to listen for two different sets of quorum acknowledgements.

          I've deliberately avoided the issue of exposing the view to the outside world (although this requires attention, as new nodes need to be able to find the ensemble!) - I have outlined some ideas earlier in this JIRA and I know other people have good suggestions, but I think we can solve both issues independently.

          Would love to hear comments, things that I've missed, errors in logic etc.

          Show
          Henry Robinson added a comment - Hi - Thanks for the proposal - it does a really good job of framing the important questions. I am in favour of a solution that uses ZAB and the existing consensus framework for dynamic group membership. I believe this can be achieved without an out-of-band protocol or significant changes to the way the current protocols work; this has the advantage of keeping things simple. I'm not certain I've read your proposal correctly, but it seems that step 6 has followers commit the CONFIGCHANGE proposal on receipt, rather than waiting for a COMMIT message. By my understanding of ZAB, this means there is a possibility where fewer than a quorum of followers will commit this proposal, if the leader fails halfway through sending the proposal messages, leading to the possibility of divergent histories at followers. The tool approach is one way of wrapping up the authentication required if an ensemble wishes to restrict those nodes that can join it. Currently there is some implicit authentication done as the leader only establishes connections with followers that belong to the static membership. However there's certainly a need, as a result of this JIRA, for a better authentication mechanism inside ZK. I see this as orthogonal to the mechanisms required to do dynamic membership. I suggest that we simply augment the current ZooKeeper protocol with four new proposals: NEWVIEW, GETVIEW, JOIN and LEAVE. NEWVIEW proposes an entirely new view, and may aggregate many JOIN or LEAVE proposals into one. Since NEWVIEW likely requires knowledge of the current view, GETVIEW returns the current view and its version. JOIN and LEAVE incrementally change the current view, whatever it is, and so do not require a GETVIEW call to establish the current view. All proposals go through the usual ZAB two-phase protocol, except for the fact that the leader coordinating the current ZAB instance must wait for acknowledgements from quorums in both the current and new view before committing the change. It's possible that this can lead to the proposal blocking if a quorum cannot be assembled in either view. Although it might seem an error that the proposal will block even if a quorum in the current view can be established, the same behaviour would be observed even if the proposal could be committed - all subsequent proposals would require a quorum from the new view and would block. If an ensemble is currently blocked due to the failure of n/2 + 1 nodes, it is not possible to resume progress by issuing a LEAVE on behalf of the failed nodes; however in general failed nodes may both JOIN and LEAVE the ensemble. If a leader election is required during a proposal, there are no correctness issues assuming the current required invariants of ZAB leader election hold. In particular, as long as the new leader has seen the most recent proposals then the view change proposal will be committed once the new leader is elected. This property will be maintained without changes to the current leader election protocols - as the view change proposal will have been seen by a quorum from the current view, the new leader is guaranteed to have a record of the proposal. A node that fails after it has issued a join proposal, but before it hears of its success must be able to find the status of the proposal once it recovers. There are several ways to do this. I have some sketches of correctness proofs for this and could produce a more detailed design document if required - however, if there's consensus that this is the right approach I'd rather get coding It turns out after much agonising that ZK's existent invariants are already pretty much strong enough to build this protocol. The only extension is the requirement to listen for two different sets of quorum acknowledgements. I've deliberately avoided the issue of exposing the view to the outside world (although this requires attention, as new nodes need to be able to find the ensemble!) - I have outlined some ideas earlier in this JIRA and I know other people have good suggestions, but I think we can solve both issues independently. Would love to hear comments, things that I've missed, errors in logic etc.
          Hide
          Benjamin Reed added a comment -

          Raghu, i think henry is correct that you must get an ack from quorums in both the old and new views before committing the change. otherwise you get split brain which could result in multiple leaders. henry, i think we are thinking along the same lines, but i'm a bit skeptical of JOIN and LEAVE. in some sense they are a bit of an optimization that can be implemented with GETVIEW and NEWVIEW. it would be nice to make the mechanism as simple as possible. it also seems like you would also require a GETVIEW to be done before doing a NEWVIEW, just for sanity. (require an expected version on NEWVIEW and not allow a -1.) i was thinking that we would just push NEWVIEW through Zab making sure we get acks from quorums in both the old and new views.

          to help mitigate the case where proposing the NEWVIEW leads to a case where the system freezes up when the NEWVIEW proposal goes out and there isn't a quorum in the new view, the leader should probably make sure that it currently has quorum of followers in the new view before proposing the request. if it doesn't, it should error out the request. even with this we can still freeze up if we lose quorum in the new view after issuing the proposal, but that would happen anyway (as you point out), but it would prevent us from doing something that has no chance of working.

          Show
          Benjamin Reed added a comment - Raghu, i think henry is correct that you must get an ack from quorums in both the old and new views before committing the change. otherwise you get split brain which could result in multiple leaders. henry, i think we are thinking along the same lines, but i'm a bit skeptical of JOIN and LEAVE. in some sense they are a bit of an optimization that can be implemented with GETVIEW and NEWVIEW. it would be nice to make the mechanism as simple as possible. it also seems like you would also require a GETVIEW to be done before doing a NEWVIEW, just for sanity. (require an expected version on NEWVIEW and not allow a -1.) i was thinking that we would just push NEWVIEW through Zab making sure we get acks from quorums in both the old and new views. to help mitigate the case where proposing the NEWVIEW leads to a case where the system freezes up when the NEWVIEW proposal goes out and there isn't a quorum in the new view, the leader should probably make sure that it currently has quorum of followers in the new view before proposing the request. if it doesn't, it should error out the request. even with this we can still freeze up if we lose quorum in the new view after issuing the proposal, but that would happen anyway (as you point out), but it would prevent us from doing something that has no chance of working.
          Hide
          Flavio Junqueira added a comment -

          In general I like Henry's solution, and I think it works. However, I'm not entirely convinced that we need to augment the protocol with messages such as JOIN and LEAVE. I believe we can make it work by simply writing to a special znode and reading from it, which we need to do anyway if we want to use the mechanisms we have in place for durability. Of course, the leader has to follow changes to this znode and adapt its behavior accordingly (e.g., when sending proposals and commits). Followers, as far as I can tell, only need to register the changes to the znode as they make no use of such information, only for leader election

          I also agree that there is an authentication problem as we don't want some arbitrary machine trying to join an ensemble.

          If you're willing to share your proof sketches, I would be pleased to take a look at them.

          Show
          Flavio Junqueira added a comment - In general I like Henry's solution, and I think it works. However, I'm not entirely convinced that we need to augment the protocol with messages such as JOIN and LEAVE. I believe we can make it work by simply writing to a special znode and reading from it, which we need to do anyway if we want to use the mechanisms we have in place for durability. Of course, the leader has to follow changes to this znode and adapt its behavior accordingly (e.g., when sending proposals and commits). Followers, as far as I can tell, only need to register the changes to the znode as they make no use of such information, only for leader election I also agree that there is an authentication problem as we don't want some arbitrary machine trying to join an ensemble. If you're willing to share your proof sketches, I would be pleased to take a look at them.
          Hide
          Mahadev konar added a comment -

          updated the wrong jira!

          Show
          Mahadev konar added a comment - updated the wrong jira!
          Hide
          Henry Robinson added a comment -

          Great comments, thanks very much.

          @Benjamin:

          You're right that JOIN and LEAVE are special cases of GETVIEW / NEWVIEW. They exist only to allow easy expression of what I reckon will be a common pattern: "add my node to whatever the current cluster is", otherwise as you say you need a GETVIEW for sanity.

          There's an optimisation that would allow JOIN requests to always be committed even though the joiner might have failed - the leader can take it as read that the joiner implicitly acknowledges the JOIN proposal and can therefore push through the commit without requiring an explicit ack from it (you can show that, at least for majority quorums, the only way to have a quorum in an old view but not in the new under a JOIN is for the joiner to fail). However, since the ensemble would be failed immediately afterwards it wouldn't be much of a win.

          I agree that simple is better; JOIN makes the client programming simpler at the expense of the server (and saves the client a round-trip), but functionally it's subsumed by the getview->newview pattern.

          I like erroring out when NEWVIEW won't result in a live ensemble, or at least having that as an option.

          @Flavio:

          I definitely agree that the persisted state should be in a znode (and should therefore be exposable through watches - important for externalising membership data for bootstrapping). However, since these operations require a very slightly different protocol to normal ZAB, I'm in favour of using the opcode to tell the leader to use a protocol variant rather than the identity of the znode - this means that the arguments to 'set' or whatever are kept opaque to the Leader. This is a matter of taste however - I'm not totally committed to JOIN et. al.

          Show
          Henry Robinson added a comment - Great comments, thanks very much. @Benjamin: You're right that JOIN and LEAVE are special cases of GETVIEW / NEWVIEW. They exist only to allow easy expression of what I reckon will be a common pattern: "add my node to whatever the current cluster is", otherwise as you say you need a GETVIEW for sanity. There's an optimisation that would allow JOIN requests to always be committed even though the joiner might have failed - the leader can take it as read that the joiner implicitly acknowledges the JOIN proposal and can therefore push through the commit without requiring an explicit ack from it (you can show that, at least for majority quorums, the only way to have a quorum in an old view but not in the new under a JOIN is for the joiner to fail). However, since the ensemble would be failed immediately afterwards it wouldn't be much of a win. I agree that simple is better; JOIN makes the client programming simpler at the expense of the server (and saves the client a round-trip), but functionally it's subsumed by the getview->newview pattern. I like erroring out when NEWVIEW won't result in a live ensemble, or at least having that as an option. @Flavio: I definitely agree that the persisted state should be in a znode (and should therefore be exposable through watches - important for externalising membership data for bootstrapping). However, since these operations require a very slightly different protocol to normal ZAB, I'm in favour of using the opcode to tell the leader to use a protocol variant rather than the identity of the znode - this means that the arguments to 'set' or whatever are kept opaque to the Leader. This is a matter of taste however - I'm not totally committed to JOIN et. al.
          Hide
          Flavio Junqueira added a comment -

          I think having the messages explicitly in the protocol helps to convey the implemented abstraction, making it easier to read and understand. However, it is bad for backward compatibility, although it might be the case that we silently ignore unknown messages.

          Show
          Flavio Junqueira added a comment - I think having the messages explicitly in the protocol helps to convey the implemented abstraction, making it easier to read and understand. However, it is bad for backward compatibility, although it might be the case that we silently ignore unknown messages.
          Hide
          Raghu S added a comment -

          Henry, et al. thanks for the feedback on my proposal.

          To explain a bit more on the proposal – I felt that keeping the data and configuration separate, like it is today, would keep the implementation simple and non intrusive for core ZK code while ensuring correctness. The issue I see with storing config in a znode is that the joiner needs to participate in NEWVIEW ZAB message even when it is not part of the cluster and it needs to have the latest log before it can commit the NEWVIEW proposal since it requires writing to the log. At the same time, the leader has to make sure that it blocks all proposals in between syncing the joiner and executing NEWVIEW. I felt this could be too intrusive (may be I shouldn't have worried about this while thinking about a higher level proposal?). Let me know if my understanding is incorrect.

          I don't think my proposal would result in a split brain. I believe there is no need for two phase during changing cluster configuration as long each attempt to modify the configuration generates a new version number and the cluster configuration divergence is reconciled during leader election (peers go with the configuration with the highest version number). Having a two phase is no better, since there is no guarantee that all/majority peers have committed the new configuration and there could be diverged view of cluster configuration during election. Let me know if I am missing something.

          I do believe Henry's proposal would work. I don't have a strong preference for how we would like to do this, as long as we get it right.

          Show
          Raghu S added a comment - Henry, et al. thanks for the feedback on my proposal. To explain a bit more on the proposal – I felt that keeping the data and configuration separate, like it is today, would keep the implementation simple and non intrusive for core ZK code while ensuring correctness. The issue I see with storing config in a znode is that the joiner needs to participate in NEWVIEW ZAB message even when it is not part of the cluster and it needs to have the latest log before it can commit the NEWVIEW proposal since it requires writing to the log. At the same time, the leader has to make sure that it blocks all proposals in between syncing the joiner and executing NEWVIEW. I felt this could be too intrusive (may be I shouldn't have worried about this while thinking about a higher level proposal?). Let me know if my understanding is incorrect. I don't think my proposal would result in a split brain. I believe there is no need for two phase during changing cluster configuration as long each attempt to modify the configuration generates a new version number and the cluster configuration divergence is reconciled during leader election (peers go with the configuration with the highest version number). Having a two phase is no better, since there is no guarantee that all/majority peers have committed the new configuration and there could be diverged view of cluster configuration during election. Let me know if I am missing something. I do believe Henry's proposal would work. I don't have a strong preference for how we would like to do this, as long as we get it right.
          Hide
          Benjamin Reed added a comment -

          i think i agree with flavio about using setData instead of NEWVIEW. (to be honest we had talked about this approach a while back in private.) we have already reserved the /zookeeper namespace (by convention). so if we use /zookeeper/ensemble to store the ensembles, then we can just use setData and getData to implement NEWVIEW and GETVIEW. This has two nice properties: 1) we don't need to touch the protocol code at all. 2) we can use standard clients to administer the view changes. for example, you could use the cli to get and manipulate the views.

          @raghu: here is the scenario for split brain: you have an ensemble of A, B, C, D, and E. Your new view will be made up of B, C, D, E, and F. if your update only hits D, E, and F, you can have two working ZooKeeper instances: A, B, C and D, E, F, thus giving your split brain.

          Show
          Benjamin Reed added a comment - i think i agree with flavio about using setData instead of NEWVIEW. (to be honest we had talked about this approach a while back in private.) we have already reserved the /zookeeper namespace (by convention). so if we use /zookeeper/ensemble to store the ensembles, then we can just use setData and getData to implement NEWVIEW and GETVIEW. This has two nice properties: 1) we don't need to touch the protocol code at all. 2) we can use standard clients to administer the view changes. for example, you could use the cli to get and manipulate the views. @raghu: here is the scenario for split brain: you have an ensemble of A, B, C, D, and E. Your new view will be made up of B, C, D, E, and F. if your update only hits D, E, and F, you can have two working ZooKeeper instances: A, B, C and D, E, F, thus giving your split brain.
          Hide
          Benjamin Reed added a comment -

          just a caveat to my last comment. for point 1) we actually do need to touch the protocol code a bit to ensure that the setData that changes the view commits in both the old and new views.

          Show
          Benjamin Reed added a comment - just a caveat to my last comment. for point 1) we actually do need to touch the protocol code a bit to ensure that the setData that changes the view commits in both the old and new views.
          Hide
          Raghu S added a comment -

          Ben, to be honest, I wasn't thinking batch addition/deletion. I was thinking we will allow only one node to join or leave the cluster at a time, in which case we won't end up in a split brain.

          One thing I am still missing is, how do we plan to reconcile the divergence in conifguration info during leader election if we use ZAB? With ZAB, we go ahead and write to the log as soon as a PROPOSAL is sent. COMMIT is used only to notify the servers that the a majority have logged the update and the clients can start reading the new update. So I am not really seeing how this will help configuration change. Now in the example that you bring up, if D, E and F have logged the new view and all the nodes are brought up after a power cycle, a split brain could still occur, no? Should we allow only one node to be added/deleted at a time?

          Show
          Raghu S added a comment - Ben, to be honest, I wasn't thinking batch addition/deletion. I was thinking we will allow only one node to join or leave the cluster at a time, in which case we won't end up in a split brain. One thing I am still missing is, how do we plan to reconcile the divergence in conifguration info during leader election if we use ZAB? With ZAB, we go ahead and write to the log as soon as a PROPOSAL is sent. COMMIT is used only to notify the servers that the a majority have logged the update and the clients can start reading the new update. So I am not really seeing how this will help configuration change. Now in the example that you bring up, if D, E and F have logged the new view and all the nodes are brought up after a power cycle, a split brain could still occur, no? Should we allow only one node to be added/deleted at a time?
          Hide
          Benjamin Reed added a comment -

          so if you do one at a time without using Zab, without working through the details
          1) start with A, B, C, D
          2) A is the leader and proposes LEAVE D and fails where only A and C get it.
          3) B is the leader and proposes LEAVE C and fails where only B and D get it because of a complete power outage.
          4) everything comes back up
          5) A is elected leader by C
          6) B is elected leader by D

          if we use ZAB split brain will not occur because we do not use the configuration until it has been committed. since it has been accepted by both the old and new quorums, we will eventually converge on the new configuration. (that is my conjecture, still needs to be proven)

          Show
          Benjamin Reed added a comment - so if you do one at a time without using Zab, without working through the details 1) start with A, B, C, D 2) A is the leader and proposes LEAVE D and fails where only A and C get it. 3) B is the leader and proposes LEAVE C and fails where only B and D get it because of a complete power outage. 4) everything comes back up 5) A is elected leader by C 6) B is elected leader by D if we use ZAB split brain will not occur because we do not use the configuration until it has been committed. since it has been accepted by both the old and new quorums, we will eventually converge on the new configuration. (that is my conjecture, still needs to be proven)
          Hide
          Raghu S added a comment -

          Ben, I still believe the split brain won't occur:

          A. After (2), A and C have config verion X + 1, B and D are at X
          B. After A dies, a leader election is not possible without C. During LE, B and D discover that C is at X + 1. This will force B and D to update their configuration to X + 1 and restart the election. This is what I refer to when I say "reconciling configuration divergence" in my write up. D now leaves the cluster since it just learnt that it was deleted.
          C. A new quorum is formed with B and C.
          D. When A comes back, config version of A B and C are the same. A will simply join the leader. If A were still at X, then it will first update it's configuration to X + 1 when it starts an election and then restart the election.

          Show
          Raghu S added a comment - Ben, I still believe the split brain won't occur: A. After (2), A and C have config verion X + 1, B and D are at X B. After A dies, a leader election is not possible without C. During LE, B and D discover that C is at X + 1. This will force B and D to update their configuration to X + 1 and restart the election. This is what I refer to when I say "reconciling configuration divergence" in my write up. D now leaves the cluster since it just learnt that it was deleted. C. A new quorum is formed with B and C. D. When A comes back, config version of A B and C are the same. A will simply join the leader. If A were still at X, then it will first update it's configuration to X + 1 when it starts an election and then restart the election.
          Hide
          Benjamin Reed added a comment -

          oh right. you are correct. i guess it is more of a liveness/correctness issue:

          1) start with A, B, C, D
          2) B is down and A is the leader and proposes LEAVE C and fails where only D gets it.
          3) C and D cannot get quorum since C has an older view.
          4) D fails
          5) A and B come back up and B is elected leader.
          6) B proposes LEAVE A and C gets it before B fails.

          Now what happens? we cannot get quorum with just A and C since A has the old view. even if D comes up it will not elect C because it does not believe C is part of the ensemble. if they all come up either C or D can be elected leader, but if C is elected you end up with conflicting views: A thinks (B, C, D), B thinks (B, C, D), C thinks (B, C, D), and D thinks (A, B, D), so both A and D will effectively be out of the ensemble and you can't tolerate any failures.

          Show
          Benjamin Reed added a comment - oh right. you are correct. i guess it is more of a liveness/correctness issue: 1) start with A, B, C, D 2) B is down and A is the leader and proposes LEAVE C and fails where only D gets it. 3) C and D cannot get quorum since C has an older view. 4) D fails 5) A and B come back up and B is elected leader. 6) B proposes LEAVE A and C gets it before B fails. Now what happens? we cannot get quorum with just A and C since A has the old view. even if D comes up it will not elect C because it does not believe C is part of the ensemble. if they all come up either C or D can be elected leader, but if C is elected you end up with conflicting views: A thinks (B, C, D), B thinks (B, C, D), C thinks (B, C, D), and D thinks (A, B, D), so both A and D will effectively be out of the ensemble and you can't tolerate any failures.
          Hide
          Raghu S added a comment -

          Right, that's a problem, that rules out using my proposal.

          Now some logistics related:

          @henry,

          We need this feature pretty desperately and it will be great if you tell me how we want to go about designing and implementing this. IOW, would you like to own the design and implementation of this? If yes, will be really great if you could let me know the timeframe. Please don't get me wrong, just trying to figure out the timeframe here. If you can't get to this in the short term, I can chip in. Thanks.

          Show
          Raghu S added a comment - Right, that's a problem, that rules out using my proposal. Now some logistics related: @henry, We need this feature pretty desperately and it will be great if you tell me how we want to go about designing and implementing this. IOW, would you like to own the design and implementation of this? If yes, will be really great if you could let me know the timeframe. Please don't get me wrong, just trying to figure out the timeframe here. If you can't get to this in the short term, I can chip in. Thanks.
          Hide
          Henry Robinson added a comment -

          Yes, I'd like to take ownership of implementing this.

          I'd like to have a patch available within one to two weeks. There are some implementation issues to work through which might take time (for example, how do we manage the connections between joining followers and the current leader - who connects to whom?). I see the initial version of the patch simply as adding functionality to the core protocol. Adding any extensions to the client APIs would come in a second revision. Ironing out the kinks in the first patch will also doubtless take some time.

          Does that sound ok? You can go with an unstable implementation as soon as the patch is released.

          Show
          Henry Robinson added a comment - Yes, I'd like to take ownership of implementing this. I'd like to have a patch available within one to two weeks. There are some implementation issues to work through which might take time (for example, how do we manage the connections between joining followers and the current leader - who connects to whom?). I see the initial version of the patch simply as adding functionality to the core protocol. Adding any extensions to the client APIs would come in a second revision. Ironing out the kinks in the first patch will also doubtless take some time. Does that sound ok? You can go with an unstable implementation as soon as the patch is released.
          Hide
          Raghu S added a comment -

          That sounds great! I know this is a complex task and lot of work, can live with the kinks in the beginning.

          Show
          Raghu S added a comment - That sounds great! I know this is a complex task and lot of work, can live with the kinks in the beginning.
          Hide
          Raghu S added a comment -

          Henry, the JIRA is unassigned. You might want to assign it to yourself.

          Show
          Raghu S added a comment - Henry, the JIRA is unassigned. You might want to assign it to yourself.
          Hide
          Raghu S added a comment -

          Henry, just a question out of curiosity - how do you think the cluster will be created on day one? Would a 2/3 node cluster be built initially by powering on the servers using pre-populated configuration and then nodes are added/deleted dynamically? Or a cluster will be built incrementally using dynamic configuration change, starting with a single node?

          Show
          Raghu S added a comment - Henry, just a question out of curiosity - how do you think the cluster will be created on day one? Would a 2/3 node cluster be built initially by powering on the servers using pre-populated configuration and then nodes are added/deleted dynamically? Or a cluster will be built incrementally using dynamic configuration change, starting with a single node?
          Hide
          Flavio Junqueira added a comment -

          I suggest that the system starts as a standalone instance and the other replicas join by contacting the standalone replica using the new dynamic membership mechanism. This way we avoid pre-loading a configuration. An important observation is that there will be a transition from standalone to ensemble, which I think won't be difficult to deal with in the code, but we have to make sure that this observation is correct.

          Show
          Flavio Junqueira added a comment - I suggest that the system starts as a standalone instance and the other replicas join by contacting the standalone replica using the new dynamic membership mechanism. This way we avoid pre-loading a configuration. An important observation is that there will be a transition from standalone to ensemble, which I think won't be difficult to deal with in the code, but we have to make sure that this observation is correct.
          Hide
          Benjamin Reed added a comment -

          i think if we use the notion of observers it helps: an observer can sync with a leader, but it doesn't get to vote. i think this makes it easy because the leader can then determine that it can commit with both the active followers and active observers if needed: for example start with A, B, C and move to A, B, D, E, F. if A and C are active followers and E and F are observers then the leader will propose the new configuration.

          Show
          Benjamin Reed added a comment - i think if we use the notion of observers it helps: an observer can sync with a leader, but it doesn't get to vote. i think this makes it easy because the leader can then determine that it can commit with both the active followers and active observers if needed: for example start with A, B, C and move to A, B, D, E, F. if A and C are active followers and E and F are observers then the leader will propose the new configuration.
          Hide
          Flavio Junqueira added a comment -

          +1, I think it is a good idea to use observers (ZOOKEEPER-368). This way we make sure that once the new configuration is committed the new active member is in sync with the leader.

          I have a slightly different idea of how to make it work, though. I was thinking that once the observer finsihes synchronizing with the leader, it can simply submit a setData. This way we have no special code path for this operation. Only when finalizing the setData operation we have to update all appropriate data structures.

          Show
          Flavio Junqueira added a comment - +1, I think it is a good idea to use observers ( ZOOKEEPER-368 ). This way we make sure that once the new configuration is committed the new active member is in sync with the leader. I have a slightly different idea of how to make it work, though. I was thinking that once the observer finsihes synchronizing with the leader, it can simply submit a setData. This way we have no special code path for this operation. Only when finalizing the setData operation we have to update all appropriate data structures.
          Hide
          Henry Robinson added a comment - - edited

          As it turns out, I've pretty much implemented Observers in all but name already - they go through the same connection logic as normal followers, and therefore sync, but are disbarred from sending Leader.REQUEST packets to the leader. Similarly, when a leader is sending a proposal packet it only gets sent to those followers which are in the current view. Since the logic is very similar, and we will be able to distinguish observers from followers by whether they are members of the current view, I haven't duplicated code into Observer* classes.

          I added this when finding that any new follower can join an existing ensemble and issue proposals to it, even if the static configuration of the ensemble does not contain it. This seemed to deadlock the ensemble pretty quickly

          Edit: of course, this means that Observers can't actually see the payload of a transaction, as per the note on ZK-368. Either the leader sends special packets (INFORM, perhaps) to Observers containing the transaction payload, or the Observers must know not to participate in voting. That said, the Leader will ignore the votes of Observers, but we want to cut down on traffic.

          Show
          Henry Robinson added a comment - - edited As it turns out, I've pretty much implemented Observers in all but name already - they go through the same connection logic as normal followers, and therefore sync, but are disbarred from sending Leader.REQUEST packets to the leader. Similarly, when a leader is sending a proposal packet it only gets sent to those followers which are in the current view. Since the logic is very similar, and we will be able to distinguish observers from followers by whether they are members of the current view, I haven't duplicated code into Observer* classes. I added this when finding that any new follower can join an existing ensemble and issue proposals to it, even if the static configuration of the ensemble does not contain it. This seemed to deadlock the ensemble pretty quickly Edit: of course, this means that Observers can't actually see the payload of a transaction, as per the note on ZK-368. Either the leader sends special packets (INFORM, perhaps) to Observers containing the transaction payload, or the Observers must know not to participate in voting. That said, the Leader will ignore the votes of Observers, but we want to cut down on traffic.
          Hide
          Flavio Junqueira added a comment -

          That's a great catch, Henry, the one related to having any new (perhaps invalid) follower being able to submit requests. When you start a new follower not in the configuration, do you run it as a regular replica and let it find its way or you explicitly tell the follower to connect to the leader?

          I'm not sure if we should discuss detail of the observer here or in the other jira, but I'm wondering how an observer is able to find the leader to connect. The default leader election uses identifiers to connect and form quorums, so I'm not sure a server not in the configuration would be able to determine which replica is the leader. I think we can do it with leader election 0, though, if a leader has been elected and is running

          Are you planning on having observers as a separate feature, as per ZK-368? It would be great to have it, since you are going through the effort of implementing it already.

          As for the message to observers containing the transaction, the advantage of having a special message (e.g., INFORM) is that we cut down the number of messages to observers: INFORM is essentially a COMMIT containing the request. If we don't change the protocol, then we can just have the leader sending a PROPOSAL to everyone, including the observers. As observers will receive the COMMIT as well, we have higher message complexity. For now, I'm good either way.

          Show
          Flavio Junqueira added a comment - That's a great catch, Henry, the one related to having any new (perhaps invalid) follower being able to submit requests. When you start a new follower not in the configuration, do you run it as a regular replica and let it find its way or you explicitly tell the follower to connect to the leader? I'm not sure if we should discuss detail of the observer here or in the other jira, but I'm wondering how an observer is able to find the leader to connect. The default leader election uses identifiers to connect and form quorums, so I'm not sure a server not in the configuration would be able to determine which replica is the leader. I think we can do it with leader election 0, though, if a leader has been elected and is running Are you planning on having observers as a separate feature, as per ZK-368? It would be great to have it, since you are going through the effort of implementing it already. As for the message to observers containing the transaction, the advantage of having a special message (e.g., INFORM) is that we cut down the number of messages to observers: INFORM is essentially a COMMIT containing the request. If we don't change the protocol, then we can just have the leader sending a PROPOSAL to everyone, including the observers. As observers will receive the COMMIT as well, we have higher message complexity. For now, I'm good either way.
          Hide
          Henry Robinson added a comment -

          I think the issue of how to locate an ensemble whose makeup has changed needs to be discussed separately. I've got an idea for how I'd suggest doing it, but will leave that until I've got the view change stuff working. Once a new leader has been elected, it will need to publish this somewhere (probably both internal to ZK in /zookeeper/ensemble and externally). Observers can use one of those routes to find the leader.

          At the moment, Observers are just followers that a) can't make most mutable proposals b) don't get either PROPOSE or COMMIT messages, just INFORM ones with the payload and c) can propose view changes, not necessarily to include themselves. So an Observer attaches to a leader, syncs and maybe listens in on the proposal stream for a while and then upgrades itself by issuing a view change request.

          Show
          Henry Robinson added a comment - I think the issue of how to locate an ensemble whose makeup has changed needs to be discussed separately. I've got an idea for how I'd suggest doing it, but will leave that until I've got the view change stuff working. Once a new leader has been elected, it will need to publish this somewhere (probably both internal to ZK in /zookeeper/ensemble and externally). Observers can use one of those routes to find the leader. At the moment, Observers are just followers that a) can't make most mutable proposals b) don't get either PROPOSE or COMMIT messages, just INFORM ones with the payload and c) can propose view changes, not necessarily to include themselves. So an Observer attaches to a leader, syncs and maybe listens in on the proposal stream for a while and then upgrades itself by issuing a view change request.
          Hide
          Benjamin Reed added a comment -

          sorry to jump in late here. rather than adding the inform, why don't we just send the PROPOSE and COMMIT to the Observer as normal, and just make the Observer not send ACKs? That way we change as little code as possible with minimum overhead. It also makes switching from Observer to Follower as easy as turning on the ACKs. I also think Observers should be able to issue proposals. One use case for observers are remote data centers that basically proxy clients that connect to ZooKeeper. This means an Observer is just a Follower that doesn't vote (ACK).

          Show
          Benjamin Reed added a comment - sorry to jump in late here. rather than adding the inform, why don't we just send the PROPOSE and COMMIT to the Observer as normal, and just make the Observer not send ACKs? That way we change as little code as possible with minimum overhead. It also makes switching from Observer to Follower as easy as turning on the ACKs. I also think Observers should be able to issue proposals. One use case for observers are remote data centers that basically proxy clients that connect to ZooKeeper. This means an Observer is just a Follower that doesn't vote (ACK).
          Hide
          Henry Robinson added a comment -

          That's definitely one way to do it. The other side to that argument is to keep the message complexity down, especially if we can envisage use cases with lots of Observers. A connection to a remote Observer might be more likely to violate the FIFO requirement of ZK connections; having a single-message protocol makes it easier to deal with this case (not a correctness issue of Observers, just annoying if PROPOSALs arrive after COMMITs for some reason). I think that's a marginal issue though. My preference is for INFORM messages as this completely separates Observer logic from Follower logic and doesn't add much complexity to the code.

          The Observer also has to take care not to participate in leader elections. I think Observers also need to announce themselves as such to the Leader, to enable the case where a Follower wishes to connect as an Observer temporarily (otherwise the Leader will think the Observer to be a Follower and use it as part of a quorum). Also if the leader can distinguish between followers and observers then it can treat both differently (e.g. through batching multiple INFORMs or allowing observers to lag by prioritising follower traffic).

          Keeping Observers as special-case Followers would simplify the code for the observers patch (I've got a new version nearly ready to submit, just fixing some tests). However, it would mean that Observers are harder to customise - for example, there's no persistence requirement for an Observer and so some of the RequestProcessors can be optionally removed or replaced by something that only asynchronously writes to disk. Keeping them lightweight has been a goal. My feeling was that I was introducing too many 'if (amObserver())

          {...}

          ' branches to an already fairly hard to follow bit of code (in particular Follower.followLeader). Breaking the functionality into two separate classes seems to have made things cleaner.

          Regarding Observers being able to issue proposals; I don't have a problem with that, should be reasonably easy to add.

          Show
          Henry Robinson added a comment - That's definitely one way to do it. The other side to that argument is to keep the message complexity down, especially if we can envisage use cases with lots of Observers. A connection to a remote Observer might be more likely to violate the FIFO requirement of ZK connections; having a single-message protocol makes it easier to deal with this case (not a correctness issue of Observers, just annoying if PROPOSALs arrive after COMMITs for some reason). I think that's a marginal issue though. My preference is for INFORM messages as this completely separates Observer logic from Follower logic and doesn't add much complexity to the code. The Observer also has to take care not to participate in leader elections. I think Observers also need to announce themselves as such to the Leader, to enable the case where a Follower wishes to connect as an Observer temporarily (otherwise the Leader will think the Observer to be a Follower and use it as part of a quorum). Also if the leader can distinguish between followers and observers then it can treat both differently (e.g. through batching multiple INFORMs or allowing observers to lag by prioritising follower traffic). Keeping Observers as special-case Followers would simplify the code for the observers patch (I've got a new version nearly ready to submit, just fixing some tests). However, it would mean that Observers are harder to customise - for example, there's no persistence requirement for an Observer and so some of the RequestProcessors can be optionally removed or replaced by something that only asynchronously writes to disk. Keeping them lightweight has been a goal. My feeling was that I was introducing too many 'if (amObserver()) {...} ' branches to an already fairly hard to follow bit of code (in particular Follower.followLeader). Breaking the functionality into two separate classes seems to have made things cleaner. Regarding Observers being able to issue proposals; I don't have a problem with that, should be reasonably easy to add.
          Hide
          Flavio Junqueira added a comment -

          I think this discussion is really interesting, but we can we move the discussion on the behavior of the observer to ZOOKEEPER-368? I'll add my comments on the last set of comments there.

          Show
          Flavio Junqueira added a comment - I think this discussion is really interesting, but we can we move the discussion on the behavior of the observer to ZOOKEEPER-368 ? I'll add my comments on the last set of comments there.
          Hide
          Raghu S added a comment -

          Sorry to jump around bit, I thought I will mention this if we haven't already talked about it. How do we plan to deal with a situation when a set of nodes can form a majority but can't form an ensemble because one or more peers have a grossly outdated configuration? Say an ensemble of ABCDE moved to EFGHI while E was offline and only EFG are up? They form a majority but can't form an ensemble since E doesn't know about any of the other servers yet?

          One way to address this is to implement an out of band synchronization mechanism in which E will realize that the ensemble has changed when F and G try to connect to E and have one them synchronize E's logs since their last know zxids are ahead of E's. E can then attempt to restart an election. Also, it is possible that F and G could see different ensembles (F is a bit out dated, G is the most up to date), in which case E might first sync up form F and then both E and F sync up form G if G comes online a bit later.

          Any simpler solutions?

          Show
          Raghu S added a comment - Sorry to jump around bit, I thought I will mention this if we haven't already talked about it. How do we plan to deal with a situation when a set of nodes can form a majority but can't form an ensemble because one or more peers have a grossly outdated configuration? Say an ensemble of ABCDE moved to EFGHI while E was offline and only EFG are up? They form a majority but can't form an ensemble since E doesn't know about any of the other servers yet? One way to address this is to implement an out of band synchronization mechanism in which E will realize that the ensemble has changed when F and G try to connect to E and have one them synchronize E's logs since their last know zxids are ahead of E's. E can then attempt to restart an election. Also, it is possible that F and G could see different ensembles (F is a bit out dated, G is the most up to date), in which case E might first sync up form F and then both E and F sync up form G if G comes online a bit later. Any simpler solutions?
          Hide
          Henry Robinson added a comment -

          Raghu:

          Your solution is essentially what will happen. F and G will contact E while they are trying to elect a leader. During this process they can all exchange the most recent view that they saw so that E realises the current view. If EFG form a quorum in any view then we can see that either a) it is the latest view or b) at least one of them will know about a later view. Therefore there's also no concern about resurrecting old views.

          Show
          Henry Robinson added a comment - Raghu: Your solution is essentially what will happen. F and G will contact E while they are trying to elect a leader. During this process they can all exchange the most recent view that they saw so that E realises the current view. If EFG form a quorum in any view then we can see that either a) it is the latest view or b) at least one of them will know about a later view. Therefore there's also no concern about resurrecting old views.
          Hide
          Patrick Hunt added a comment -

          I've only been following this a bit, and I see bits/pieces in the comments but not sure I follow it all – some questions around the plan wrt manageability:

          1) adding removing servers, server itself needs to be configured, any changes needed to config on existing ensemble? I see Raghu has similar comment on this
          2) JMX - what's the plan? what additional properties/actions will be supported?
          3) 4letter words - same issues as jmx
          4) debug-ability - ensure adequate logging (log4j) on ensemble

          5) security - will an ensemble allow any server to connect to it? today we have ensemble participants hardwired into the config of each of the servers right?

          testing and b/w compat – are we ensuring b/w compat btw this version and previous versions? (I'm probably going to look at beefing up unit & systest next, esp around b/w compat, so would be good to have a better idea where this is headed). IMO this patch must include unit as well as systest before it is committed.

          documentation will be needed as well.

          Perhaps a wiki "proposal" page should be created that will capture the "current proposal" for easy review of this feature? This JIRA can capture ongoing discussion, with agreed upon results capture in the wiki design/functional document. I know it would help me alot.

          Show
          Patrick Hunt added a comment - I've only been following this a bit, and I see bits/pieces in the comments but not sure I follow it all – some questions around the plan wrt manageability: 1) adding removing servers, server itself needs to be configured, any changes needed to config on existing ensemble? I see Raghu has similar comment on this 2) JMX - what's the plan? what additional properties/actions will be supported? 3) 4letter words - same issues as jmx 4) debug-ability - ensure adequate logging (log4j) on ensemble 5) security - will an ensemble allow any server to connect to it? today we have ensemble participants hardwired into the config of each of the servers right? testing and b/w compat – are we ensuring b/w compat btw this version and previous versions? (I'm probably going to look at beefing up unit & systest next, esp around b/w compat, so would be good to have a better idea where this is headed). IMO this patch must include unit as well as systest before it is committed. documentation will be needed as well. Perhaps a wiki "proposal" page should be created that will capture the "current proposal" for easy review of this feature? This JIRA can capture ongoing discussion, with agreed upon results capture in the wiki design/functional document. I know it would help me alot.
          Hide
          Raghu S added a comment -

          @henry,

          Sorry if this sounds like a repeat, thought I will summarize the error handling during view change. Could you comment if this makes sense?

          ----------

          1. Configuration change succeeds if the change is successfully committed in both the old view and the new view. An observer is promoted to a follower only after it receives a COMMIT for the new view.

          2. Each peer could have two views of the cluster – the last committed view and the last proposed view (which is created after a VIEWCHANGE proposal is received). The latter can be NULL if there is no view change attempt in progress.
          2.A. Each peer will always attempt an election with the last committed view. Proposed views will be converted to committed views (or deleted) post leader election.
          2.B. The proposal record of a peer contains (in addition to last logged ZXID and server ID) the last committed view of the peer

          3. During election, if the last committed view of the peer with the smaller ZXID (P(ZXLOW)) is different from the last committed view of the peer with the higher ZXID (P(ZXHIGH), then P(ZXLOW) adapts P(ZXHIGH)'s last committed view and broadcasts the adapted view to all other peers.
          3.A. Two nodes with the same ZXID should have the same committed views
          3.B. If the last committed views of P(ZXLOW) and P(ZXHIGH) are the same, but P(ZXHIGH) has a proposed new view (not committed yet though), that view will not be considered by both the peers during election. Similarly, if the N(ZXLOW) has a proposed view, that will not be considered either.
          3.C. If P(ZXLOW) adapts P(ZXHIGH)'s last committed view and that view doesn't include P(ZXLOW), P(ZXLOW) drops out of election (should it self destruct??)

          4. Once a leader is elected, it will sync up the logs of the followers that are lagging behind just like it's done today:

          • If there is a follower who's last committed view is different from the leader's, log synchronization will make sure follower's last committed view gets updated to be in sync with the leader's. Follower doesn't do anything when its last committed view changes (the new view MUST have the follower since 3.C prevents a follower that is not in the leading candidate's committed view from successfully completing an election)
          • If there is an observer who upon log synchronization learns that the committed view includes the observer, the observer will promote itself to a follower
          • If a follower with a proposed view joins an already established leader who doesn't know about that proposed view, the follower's proposed view will be erased when the leader synchronizes the followers log
          • If the leader has a proposed new view in its log, the leader will send a COMMIT for the new view after majority peers in the old view and the new view have synced their log to the leader's log
            4.A. The view change COMMIT doesn't mean much for the followers that are not impacted by the view change
            4.B. The observer that gets view change COMMIT will promote itself to a follower if the new view includes the observer
            4.C. The follower that gets the view change will drop out of the cluster if the new view doesn't include the follower
            4.D. The leader will drop out of the cluster once COMMIT is delivered locally if the new view doesn't include the leader. This will result in a new election.
            4.E. The leader will adjust the quorum size as per the new view otherwise.
          Show
          Raghu S added a comment - @henry, Sorry if this sounds like a repeat, thought I will summarize the error handling during view change. Could you comment if this makes sense? ---------- 1. Configuration change succeeds if the change is successfully committed in both the old view and the new view. An observer is promoted to a follower only after it receives a COMMIT for the new view. 2. Each peer could have two views of the cluster – the last committed view and the last proposed view (which is created after a VIEWCHANGE proposal is received). The latter can be NULL if there is no view change attempt in progress. 2.A. Each peer will always attempt an election with the last committed view. Proposed views will be converted to committed views (or deleted) post leader election. 2.B. The proposal record of a peer contains (in addition to last logged ZXID and server ID) the last committed view of the peer 3. During election, if the last committed view of the peer with the smaller ZXID (P(ZXLOW)) is different from the last committed view of the peer with the higher ZXID (P(ZXHIGH), then P(ZXLOW) adapts P(ZXHIGH)'s last committed view and broadcasts the adapted view to all other peers. 3.A. Two nodes with the same ZXID should have the same committed views 3.B. If the last committed views of P(ZXLOW) and P(ZXHIGH) are the same, but P(ZXHIGH) has a proposed new view (not committed yet though), that view will not be considered by both the peers during election. Similarly, if the N(ZXLOW) has a proposed view, that will not be considered either. 3.C. If P(ZXLOW) adapts P(ZXHIGH)'s last committed view and that view doesn't include P(ZXLOW), P(ZXLOW) drops out of election (should it self destruct??) 4. Once a leader is elected, it will sync up the logs of the followers that are lagging behind just like it's done today: If there is a follower who's last committed view is different from the leader's, log synchronization will make sure follower's last committed view gets updated to be in sync with the leader's. Follower doesn't do anything when its last committed view changes (the new view MUST have the follower since 3.C prevents a follower that is not in the leading candidate's committed view from successfully completing an election) If there is an observer who upon log synchronization learns that the committed view includes the observer, the observer will promote itself to a follower If a follower with a proposed view joins an already established leader who doesn't know about that proposed view, the follower's proposed view will be erased when the leader synchronizes the followers log If the leader has a proposed new view in its log, the leader will send a COMMIT for the new view after majority peers in the old view and the new view have synced their log to the leader's log 4.A. The view change COMMIT doesn't mean much for the followers that are not impacted by the view change 4.B. The observer that gets view change COMMIT will promote itself to a follower if the new view includes the observer 4.C. The follower that gets the view change will drop out of the cluster if the new view doesn't include the follower 4.D. The leader will drop out of the cluster once COMMIT is delivered locally if the new view doesn't include the leader. This will result in a new election. 4.E. The leader will adjust the quorum size as per the new view otherwise.
          Hide
          Quinton Hoole added a comment -

          Any progress on this issue? It seems to have gone very quiet. We have this precise requirement, and will have to solve it one way or another in the coming months.

          Show
          Quinton Hoole added a comment - Any progress on this issue? It seems to have gone very quiet. We have this precise requirement, and will have to solve it one way or another in the coming months.
          Hide
          Quinton Hoole added a comment -

          OK, not to worry. I just found the answer to my question here:

          http://www.mail-archive.com:80/zookeeper-dev@hadoop.apache.org/msg07382.html

          Show
          Quinton Hoole added a comment - OK, not to worry. I just found the answer to my question here: http://www.mail-archive.com:80/zookeeper-dev@hadoop.apache.org/msg07382.html
          Hide
          Vishal Kher added a comment -

          Hi Henry,

          We are using ZK for one the projects at VMware. We are very much interested in having dynamic membership managment. I went through the dev mailing list above . I would like to contribute and develop this feature. It sounds like a fun project.

          Can you please provide an update regarding how far we are with this and any documentation that you may have? I will start off a separte discussion thread regarding this on the dev mailing list instead of having it over the jira.

          Thanks.

          Regards,
          -Vishal

          Show
          Vishal Kher added a comment - Hi Henry, We are using ZK for one the projects at VMware. We are very much interested in having dynamic membership managment. I went through the dev mailing list above . I would like to contribute and develop this feature. It sounds like a fun project. Can you please provide an update regarding how far we are with this and any documentation that you may have? I will start off a separte discussion thread regarding this on the dev mailing list instead of having it over the jira. Thanks. Regards, -Vishal
          Hide
          Andraz Tori added a comment -

          Has anything happened with this feature?

          There was some talk about what the most important use cases are on the mailing list. We're thinking of migrating home-grown solution to Zookeeper, but can't do it without dynamic addition/removal of the servers. If it helps, here's the use case:

          We're having fully cloudy solution. Every server that we put into the cluster runs a set of services that make themselves available to a local "resource manager" that shares the list of resources with all other servers in the cluster. When we do upgrades we simply fire up new servers with new versions of the services and connect their resource managers to the old ones into the same cluster. Then we simply shut down the old servers. Beside adding/removing servers when upgrading, we also do the same thing when we need to temporarily scale - we fire up a few more servers and connect their resource managers to the cluster to make the services available to the cluster.
          We never know how many servers there are going to be in the cluster and we don't assign any dns entries to them (just another point of failure).
          The clients that need to know about resources connect to any of the "resource managers" and get a list of all resources available and also about other "resource managers". As servers move around they also can connect to different "resource manager".

          This is a bit unusual configuration since cloud practically lives on its own without any kind of static addresses. As long as you are able to connect to it at one point in time, you can keep up with it 'motion'.

          So the idea was to migrate the above system to Zookeeper. Every service would connect to local Zookeeper and create ephemeral node announcing it. So every server would run its own Zookeeper node connected to the Zookeeper cloud. However without dynamic addition/removal of the servers all this becomes infeasible.

          Ideally we'd like to have a situation where we just start a Zookeeper node by giving it a list of known other Zookeeper nodes in the cloud. And then it should take on to the life of its own.

          Hope that the use case helps. I am really looking forward to this!

          Show
          Andraz Tori added a comment - Has anything happened with this feature? There was some talk about what the most important use cases are on the mailing list. We're thinking of migrating home-grown solution to Zookeeper, but can't do it without dynamic addition/removal of the servers. If it helps, here's the use case: We're having fully cloudy solution. Every server that we put into the cluster runs a set of services that make themselves available to a local "resource manager" that shares the list of resources with all other servers in the cluster. When we do upgrades we simply fire up new servers with new versions of the services and connect their resource managers to the old ones into the same cluster. Then we simply shut down the old servers. Beside adding/removing servers when upgrading, we also do the same thing when we need to temporarily scale - we fire up a few more servers and connect their resource managers to the cluster to make the services available to the cluster. We never know how many servers there are going to be in the cluster and we don't assign any dns entries to them (just another point of failure). The clients that need to know about resources connect to any of the "resource managers" and get a list of all resources available and also about other "resource managers". As servers move around they also can connect to different "resource manager". This is a bit unusual configuration since cloud practically lives on its own without any kind of static addresses. As long as you are able to connect to it at one point in time, you can keep up with it 'motion'. So the idea was to migrate the above system to Zookeeper. Every service would connect to local Zookeeper and create ephemeral node announcing it. So every server would run its own Zookeeper node connected to the Zookeeper cloud. However without dynamic addition/removal of the servers all this becomes infeasible. Ideally we'd like to have a situation where we just start a Zookeeper node by giving it a list of known other Zookeeper nodes in the cloud. And then it should take on to the life of its own. Hope that the use case helps. I am really looking forward to this!
          Hide
          Flavio Junqueira added a comment -

          I have started a wiki page to discuss the design of this feature: http://wiki.apache.org/hadoop/ZooKeeper/ClusterMembership

          It is not supposed to replace this jira, but instead to have a more organized way of working on the design of this feature. Please consider contributing.

          Show
          Flavio Junqueira added a comment - I have started a wiki page to discuss the design of this feature: http://wiki.apache.org/hadoop/ZooKeeper/ClusterMembership It is not supposed to replace this jira, but instead to have a more organized way of working on the design of this feature. Please consider contributing.
          Show
          Alexander Shraer added a comment - http://wiki.apache.org/hadoop/ZooKeeper/ClusterMembership
          Hide
          Alexander Shraer added a comment -

          I've added a proposed algorithm and some discussion here:

          https://cwiki.apache.org/confluence/display/ZOOKEEPER/ClusterMembership

          Show
          Alexander Shraer added a comment - I've added a proposed algorithm and some discussion here: https://cwiki.apache.org/confluence/display/ZOOKEEPER/ClusterMembership
          Hide
          Vishal Kher added a comment -

          Great! Thanks for the design. I have a few suggestions/questions below.

          A. Bootstrapping the cluster
          The algorithm assumes that a leader exists prior to doing a reconfig.
          So it looks like the reconfig() API is not intended to use for bootstrapping
          the cluster. How about we define a API for initializing the cluster?

          Instead of relying on the current method of setting the initial configuration
          in the config file, we should probably also have to define a join(M) (or
          init(M)) API. When a peer receives this request, it will try to connect to the
          specified peers. During bootstrap peers will connect to each other (as they do
          now) and elect a leader.

          B. Storing membership in znode and updating client (a tangential topic to this
          design).
          Earlier ZOOKEEPER-107 proposed of using a URI based approach for the
          clients to fetch server list. I am not opposed to the URI based approach,
          however, that shouldn't be our only approach. URI based approach requires extra
          resources (e.g., fault tolerant web service or shared storage for file, etc).
          In certain environments it may not be feasible to have such a resource.
          Instead, can we use a mechanism similar to ZooKeeper watchpoints for this?
          Store the membership information in a znode and let the ZooKeeper server inform
          the clients of the changed configuration.

          C. Locating most recent config on servers
          The URI based approach will be more helpful to servers. For example, if
          A was down when M=

          {A, B, C}

          was changed to M'=

          {A, D, E}

          , then when A comes
          online it won't be able to locate the most recent config. In this case, A can
          query the URI. Second approach is to ask the leader to try to periodically
          send the membership (at least to nodes that are down).

          D. "Send a message to all members(M’) instructing them to connect to leader(M)."
          leader(M) can potentially change after sending this message. Should
          this be "Send a messages to all members(M') to connect to members(M)? See
          point G. below.

          Also, in step 7, why do you send leader(M') along with <activate-config>message>?

          E. "3. Submit a reconfig(M’) operation to leader(M)"
          What if leader(M) fails after receiving the request, but before
          informing any other peer? How will the administrative client know whether to
          retry or not? Should s retry if leader fails and should the client API retry
          if s fails?

          F. Regarding 3.a and 3.b.

          The algorithm mentions:

          "3. Otherwise:
          a. Designate the most up-to-date server in connected quorum of M' to be leader(M)
          b. Stop processing new operations, return failure for any further ops received"

          Why do we need to change the leader if leader(M) is not in M'? How about we let
          the leader perform the reconfig and at the end of phase 3 (while processing
          retire) the leader will give up leadership. This will cause a new leader
          election and one of the peer in M' will become the leader. Similarly, after the
          second phase, members(M') will stop following leader(M) if leader(M) is not in
          M'. I think this will be simpler.

          G. Online VS offline

          In your "Online vs offline" section, you mentioned that the offline
          strategy is preferred. I think we can do reconfiguration online.
          I pictured M' as modified OBSERVERS till the time the reconfiguration is complete.

          a. Let new members(M') join the cluster as OBSERVERS. Based on the current
          implementation, M' will essentially sync with the leader after becoming
          OBSERVERS and it will be easier to leverage the RequestProcessor pipeline for
          reconfig.

          b. Once a majority of M' join the leader, the leader executes phase 1 as you
          described.

          c. After phase 1., the leader changes the transaction logic a bit. Every
          transaction after this point (including reconfig-start) will be sent to M and
          M'. Leader then waits for quorum(M) and quorum(M') to ack the transaction. So
          M' are not pure OBSERVERS as we think today. However, only a member of M can
          become a leader until reconfig succeeds. Also, M' - (M n M') cannot serve
          client requests until reconfiguration is complete. By doing a transaction on
          both M and M' and waiting for the quorum of each set to ack, we keep transfering
          the state to both the configurations.

          d. After receiving ack for phase 2, the leader sends <switch-to-new-config>
          transaction to M and M' (instead of sending retire just to M).

          After receiving this message, M' will close connections with (and reject
          connections from) members not in M'. Members that are supposed to leave the
          cluster will shutdown QuorumPeer. If leader(M) is not in M', then a new
          leader(M') will be elected.

          Let me know what you think about this.

          H. What happens if a majority of M' keep failing during reconfig?

          M=

          {A, B}

          , M'=

          {A, C, D, E}

          . What if D, E fail?

          Failure of a majority of M' will permanently stall reconfig. While this is
          less likely to happen, I think ZooKeeper should handle this
          automatically. After a few retries, we should abort reconfig. Otherwise, we
          could disrupt a running cluster and we will never be able to recover without
          manual intervention. If the majority fails after phase 1, then this would mean
          sending a <abort-reconfig, version(M), M'> to M.

          Of course, one can argue - what if majority of M' fail after phase 3? So not
          sure if this is an overkill, but I feel we should handle this case.

          I. "Old reconfiguration request"

          a. We should use ZAB
          b. A side note - I think ZOOKEEPER-335 needs to be resolved for reconfig to
          work. This bug causes logs to diverge if ZK leader fails before sending
          PROPOSALs to followers (see
          http://www.mail-archive.com/zookeeper-user@hadoop.apache.org/msg00403.html).

          Because of this bug we could run into the following scenario:

          • A peer B that was leader when reconfig(M') was issued will have reconfig M'
            in its transaction log.
          • A peer C that became leader after B's failure, can have reconfig(M'') in its
            log.
          • Now, if B and C fail (say both reboot), then the outcome of reconfig will
            depend on which node takes leadership. If B becomes a leader, then out come
            is M'. If C becomes a leader, then outcome is M''

          J. Policies/ sanity checks

          • Should we allow removal of a peer in a 3 node configuration? How about in a
            2-node configuration?

          Can you please add section numbers to the design paper? It will be easier to
          refer to the text by numbers.

          Thanks again for working on this. We are about to release the first version of
          our product using ZooKeeper, which uses static configuration. Our next version
          is heavily dependent on dynamic membership. We will be happy to chip in from our
          side to help with the implementation.

          Show
          Vishal Kher added a comment - Great! Thanks for the design. I have a few suggestions/questions below. A. Bootstrapping the cluster The algorithm assumes that a leader exists prior to doing a reconfig. So it looks like the reconfig() API is not intended to use for bootstrapping the cluster. How about we define a API for initializing the cluster? Instead of relying on the current method of setting the initial configuration in the config file, we should probably also have to define a join(M) (or init(M)) API. When a peer receives this request, it will try to connect to the specified peers. During bootstrap peers will connect to each other (as they do now) and elect a leader. B. Storing membership in znode and updating client (a tangential topic to this design). Earlier ZOOKEEPER-107 proposed of using a URI based approach for the clients to fetch server list. I am not opposed to the URI based approach, however, that shouldn't be our only approach. URI based approach requires extra resources (e.g., fault tolerant web service or shared storage for file, etc). In certain environments it may not be feasible to have such a resource. Instead, can we use a mechanism similar to ZooKeeper watchpoints for this? Store the membership information in a znode and let the ZooKeeper server inform the clients of the changed configuration. C. Locating most recent config on servers The URI based approach will be more helpful to servers. For example, if A was down when M= {A, B, C} was changed to M'= {A, D, E} , then when A comes online it won't be able to locate the most recent config. In this case, A can query the URI. Second approach is to ask the leader to try to periodically send the membership (at least to nodes that are down). D. "Send a message to all members(M’) instructing them to connect to leader(M)." leader(M) can potentially change after sending this message. Should this be "Send a messages to all members(M') to connect to members(M)? See point G. below. Also, in step 7, why do you send leader(M') along with <activate-config>message>? E. "3. Submit a reconfig(M’) operation to leader(M)" What if leader(M) fails after receiving the request, but before informing any other peer? How will the administrative client know whether to retry or not? Should s retry if leader fails and should the client API retry if s fails? F. Regarding 3.a and 3.b. The algorithm mentions: "3. Otherwise: a. Designate the most up-to-date server in connected quorum of M' to be leader(M) b. Stop processing new operations, return failure for any further ops received" Why do we need to change the leader if leader(M) is not in M'? How about we let the leader perform the reconfig and at the end of phase 3 (while processing retire) the leader will give up leadership. This will cause a new leader election and one of the peer in M' will become the leader. Similarly, after the second phase, members(M') will stop following leader(M) if leader(M) is not in M'. I think this will be simpler. G. Online VS offline In your "Online vs offline" section, you mentioned that the offline strategy is preferred. I think we can do reconfiguration online. I pictured M' as modified OBSERVERS till the time the reconfiguration is complete. a. Let new members(M') join the cluster as OBSERVERS. Based on the current implementation, M' will essentially sync with the leader after becoming OBSERVERS and it will be easier to leverage the RequestProcessor pipeline for reconfig. b. Once a majority of M' join the leader, the leader executes phase 1 as you described. c. After phase 1., the leader changes the transaction logic a bit. Every transaction after this point (including reconfig-start) will be sent to M and M'. Leader then waits for quorum(M) and quorum(M') to ack the transaction. So M' are not pure OBSERVERS as we think today. However, only a member of M can become a leader until reconfig succeeds. Also, M' - (M n M') cannot serve client requests until reconfiguration is complete. By doing a transaction on both M and M' and waiting for the quorum of each set to ack, we keep transfering the state to both the configurations. d. After receiving ack for phase 2, the leader sends <switch-to-new-config> transaction to M and M' (instead of sending retire just to M). After receiving this message, M' will close connections with (and reject connections from) members not in M'. Members that are supposed to leave the cluster will shutdown QuorumPeer. If leader(M) is not in M', then a new leader(M') will be elected. Let me know what you think about this. H. What happens if a majority of M' keep failing during reconfig? M= {A, B} , M'= {A, C, D, E} . What if D, E fail? Failure of a majority of M' will permanently stall reconfig. While this is less likely to happen, I think ZooKeeper should handle this automatically. After a few retries, we should abort reconfig. Otherwise, we could disrupt a running cluster and we will never be able to recover without manual intervention. If the majority fails after phase 1, then this would mean sending a <abort-reconfig, version(M), M'> to M. Of course, one can argue - what if majority of M' fail after phase 3? So not sure if this is an overkill, but I feel we should handle this case. I. "Old reconfiguration request" a. We should use ZAB b. A side note - I think ZOOKEEPER-335 needs to be resolved for reconfig to work. This bug causes logs to diverge if ZK leader fails before sending PROPOSALs to followers (see http://www.mail-archive.com/zookeeper-user@hadoop.apache.org/msg00403.html ). Because of this bug we could run into the following scenario: A peer B that was leader when reconfig(M') was issued will have reconfig M' in its transaction log. A peer C that became leader after B's failure, can have reconfig(M'') in its log. Now, if B and C fail (say both reboot), then the outcome of reconfig will depend on which node takes leadership. If B becomes a leader, then out come is M'. If C becomes a leader, then outcome is M'' J. Policies/ sanity checks Should we allow removal of a peer in a 3 node configuration? How about in a 2-node configuration? Can you please add section numbers to the design paper? It will be easier to refer to the text by numbers. Thanks again for working on this. We are about to release the first version of our product using ZooKeeper, which uses static configuration. Our next version is heavily dependent on dynamic membership. We will be happy to chip in from our side to help with the implementation.
          Hide
          Alexander Shraer added a comment -

          Hi Vishal,

          Thanks a lot for your comments and for offering to help. I have some experience with reconfigurable solutions however
          I'm new to the ZooKeeper project and your help will be very appreciated. I suggest to start by figuring out what we'd
          like to do and then we can decide who does what.

          As you suggested I numbered the sections so it's easier to refer to. Some of the things you point out are very important but I think they are separate from the core design so we can address them separately.

          A. I agree that bootstrapping the cluster should probably be done this way, but this seems like a separate issue - in this case the reconfiguration is very simple since the state is empty and we can block until the configuration connects. One related thing - in my proposal I suggested that the server that gets the reconfig request from a client sends a message to members(M') instructing them to connect to leader(M). This can be done by invoking a join(M) command on the servers.

          B. This is an important issue and I definitely agree we should discuss this, but I think this is also orthogonal to any solution we choose now.

          C. Agree, second approach looks good (leader periodically broadcasts membership, or at least the configuration id).

          D. The purpose of connecting is only to begin state transfer, so only connecting to the leader matters. For example, there is no need
          for members(M) to detect failure of new servers that are trying to connect. If the leader fails then the new leader can be responsible for establishing connections to members(M') as part of completing the reconfiguration (see first step in Section 4.2).

          > Also, in step 7, why do you send leader(M') along with <activate-config> message?

          The idea is that there is no need to explicitly run leader election in M' - the old leader can appoint one of members(M') to be the new leader (the new leader should be chosen s.t. it is up-to-date with M). This way leader election in M' is only required if the designated leader fails. Including leader(M') in the phase-3 message instructs it to start acting as leader of M'.

          E. This is very related to ZOOKEEPER-22 (how does a client know whether its operation has been executed if the leader failed) ?
          In this case though, it should be easy to detected whether the reconfiguration was completed or not - the recovery procedure will have the property that when a new leader is established then either the new leader has observed the reconfiguration attempt and completed it or no further leader will (if we fix ZOOKEEPER-335 ).

          We should provide a way to query for the current configuration, both for the issue you mention and also for non-incremental reconfig API (I suggest that a client invoking a reconfig request should include the version of the current config in the request and if this config is no longer the current its request should be aborted, Section 4.3).

          F. This is exactly what I meant - I think you didn't notice the "’" (it says "to be leader(M’)").

          G. I agree that online would be better. I initially thought that it's not possible in this case, but now I think there's a way.

          > a. Let new members(M') join the cluster as OBSERVERS. Based on the
          > current implementation, M' will essentially sync with the leader after becoming
          > OBSERVERS and it will be easier to leverage the RequestProcessor pipeline for reconfig.

          This was also my intention - to let them connect to the leader and perform state transfer before the leader is requested to reconfigure.

          > b. Once a majority of M' join the leader, the leader executes phase 1 as you described.

          I think a reconfig operations should be sent to the leader only after a majority of M' has connected to it. This way we avoid waiting
          for this to happen in the leader code. The leader can check whether a majority is connected and not too far behind with regard to
          state transfer and abort otherwise (line 1 in the algorithm).

          > c. After phase 1., the leader changes the transaction logic a bit. Every transaction after this point (including reconfig-start) will be sent to M and M'.
          > Leader then waits for quorum(M) and quorum(M') to ack the transaction. So M' are not pure OBSERVERS as we think today. However, only a member of
          > M can become a leader until reconfig succeeds. Also, M' - (M n M') cannot serve client requests until reconfiguration is complete. By doing a
          > transaction on both M and M' and waiting for the quorum of each set to ack, we keep transferring the state to both the configurations.
          > d. After receiving ack for phase 2, the leader sends <switch-to-new-config> transaction to M and M' (instead of sending retire just to M).
          > After receiving this message, M' will close connections with (and reject connections from) members not in M'. Members that are supposed to
          > leave the cluster will shutdown QuorumPeer.

          Here are some issues that I see with your proposal:

          1. Suppose that leader(M) fails while sending phase-3 messages and all phase-3 messages arrive to members(M) but none arrive to members(M'). Now the system is stuck: M' - (M n M') cannot serve client requests whereas members(M) are no longer in the system.
          This is why I think before garbage-collecting M we must make sure that M' is operational by itself.

          2. In my proposal members(M') can accept operations as soon as they get the phase-2 message, which is the purpose of phase 2.
          I don't see why phase-2 is needed in your proposal. I suggest to stick with 3 phases s.t. phase-2 message allows M' to be independent.

          3. Because of Global Primary Order property of ZAB we can't allow a commit message to arrive from leader(M) to members(M') after they are allowed to process
          messages independently in M'. Your solution handles this because members(M') close connections to leader(M) as soon as they're allowed to process messages independently, but then its not clear how the outstanding ops get committed. In order to commit these operations nevertheless so I suggest that this would be the task of leader(M').

          Here's what I suggest:

          • after leader(M) sends phase-1 message to members(M) it starts sending every received new op to both members(M) and members(M') (treating them as followers). Commit messages must not be sent to members(M') by leader(M) and sending such messages to members(M) does not hurt but is not necessary. In any case, clients are not acked regarding these ops.
          • As soon as enough acks are received for the phase-1 message from members(M), leader(M) sends a phase-2 message to members(M') and from this moment doesn't accept new ops (unless leader(M) is also in M' and therefore acts also as leader(M')).
          • When members(M') receive the phase-2 message from leader(M) they send an ack both to leader(M) and to leader(M') - in practice it can send an ack to leader(M) first, then disconnect from it and then connect to leader(M') and send an ack to it but it doesn't matter for correctness. When leader(M') receives this ack from a quorum of members(M') (as well as the phase-2 message from leader(M)) it can commit all preceding ops in M'.

          The thing is, until ZOOKEEPER-22 is fixed the fact that we're processing ops in M' doesn't really help since the clients won't be able to know that the ops got committed.

          H. I agree - it's an overkill We can make sure that a quorum of M' is connected when the reconfiguration starts.

          I. I agree - it is important to fix ZOOKEEPER-335 (as well as ZOOKEEPER-22).

          J. As long as a quorum of the old and the new config are up during the reconfig it seems like we can do the reconfig (e.g., the majority of 2 is 2). Currently a warning is issued if there are 2 nodes in a configuration, we can do that too. Actually, we need to define liveness more carefully, it's on my ToDo list

          Best Regards,
          Alex

          Show
          Alexander Shraer added a comment - Hi Vishal, Thanks a lot for your comments and for offering to help. I have some experience with reconfigurable solutions however I'm new to the ZooKeeper project and your help will be very appreciated. I suggest to start by figuring out what we'd like to do and then we can decide who does what. As you suggested I numbered the sections so it's easier to refer to. Some of the things you point out are very important but I think they are separate from the core design so we can address them separately. A. I agree that bootstrapping the cluster should probably be done this way, but this seems like a separate issue - in this case the reconfiguration is very simple since the state is empty and we can block until the configuration connects. One related thing - in my proposal I suggested that the server that gets the reconfig request from a client sends a message to members(M') instructing them to connect to leader(M). This can be done by invoking a join(M) command on the servers. B. This is an important issue and I definitely agree we should discuss this, but I think this is also orthogonal to any solution we choose now. C. Agree, second approach looks good (leader periodically broadcasts membership, or at least the configuration id). D. The purpose of connecting is only to begin state transfer, so only connecting to the leader matters. For example, there is no need for members(M) to detect failure of new servers that are trying to connect. If the leader fails then the new leader can be responsible for establishing connections to members(M') as part of completing the reconfiguration (see first step in Section 4.2). > Also, in step 7, why do you send leader(M') along with <activate-config> message? The idea is that there is no need to explicitly run leader election in M' - the old leader can appoint one of members(M') to be the new leader (the new leader should be chosen s.t. it is up-to-date with M). This way leader election in M' is only required if the designated leader fails. Including leader(M') in the phase-3 message instructs it to start acting as leader of M'. E. This is very related to ZOOKEEPER-22 (how does a client know whether its operation has been executed if the leader failed) ? In this case though, it should be easy to detected whether the reconfiguration was completed or not - the recovery procedure will have the property that when a new leader is established then either the new leader has observed the reconfiguration attempt and completed it or no further leader will (if we fix ZOOKEEPER-335 ). We should provide a way to query for the current configuration, both for the issue you mention and also for non-incremental reconfig API (I suggest that a client invoking a reconfig request should include the version of the current config in the request and if this config is no longer the current its request should be aborted, Section 4.3). F. This is exactly what I meant - I think you didn't notice the "’" (it says "to be leader(M’)"). G. I agree that online would be better. I initially thought that it's not possible in this case, but now I think there's a way. > a. Let new members(M') join the cluster as OBSERVERS. Based on the > current implementation, M' will essentially sync with the leader after becoming > OBSERVERS and it will be easier to leverage the RequestProcessor pipeline for reconfig. This was also my intention - to let them connect to the leader and perform state transfer before the leader is requested to reconfigure. > b. Once a majority of M' join the leader, the leader executes phase 1 as you described. I think a reconfig operations should be sent to the leader only after a majority of M' has connected to it. This way we avoid waiting for this to happen in the leader code. The leader can check whether a majority is connected and not too far behind with regard to state transfer and abort otherwise (line 1 in the algorithm). > c. After phase 1., the leader changes the transaction logic a bit. Every transaction after this point (including reconfig-start) will be sent to M and M'. > Leader then waits for quorum(M) and quorum(M') to ack the transaction. So M' are not pure OBSERVERS as we think today. However, only a member of > M can become a leader until reconfig succeeds. Also, M' - (M n M') cannot serve client requests until reconfiguration is complete. By doing a > transaction on both M and M' and waiting for the quorum of each set to ack, we keep transferring the state to both the configurations. > d. After receiving ack for phase 2, the leader sends <switch-to-new-config> transaction to M and M' (instead of sending retire just to M). > After receiving this message, M' will close connections with (and reject connections from) members not in M'. Members that are supposed to > leave the cluster will shutdown QuorumPeer. Here are some issues that I see with your proposal: 1. Suppose that leader(M) fails while sending phase-3 messages and all phase-3 messages arrive to members(M) but none arrive to members(M'). Now the system is stuck: M' - (M n M') cannot serve client requests whereas members(M) are no longer in the system. This is why I think before garbage-collecting M we must make sure that M' is operational by itself. 2. In my proposal members(M') can accept operations as soon as they get the phase-2 message, which is the purpose of phase 2. I don't see why phase-2 is needed in your proposal. I suggest to stick with 3 phases s.t. phase-2 message allows M' to be independent. 3. Because of Global Primary Order property of ZAB we can't allow a commit message to arrive from leader(M) to members(M') after they are allowed to process messages independently in M'. Your solution handles this because members(M') close connections to leader(M) as soon as they're allowed to process messages independently, but then its not clear how the outstanding ops get committed. In order to commit these operations nevertheless so I suggest that this would be the task of leader(M'). Here's what I suggest: after leader(M) sends phase-1 message to members(M) it starts sending every received new op to both members(M) and members(M') (treating them as followers). Commit messages must not be sent to members(M') by leader(M) and sending such messages to members(M) does not hurt but is not necessary. In any case, clients are not acked regarding these ops. As soon as enough acks are received for the phase-1 message from members(M), leader(M) sends a phase-2 message to members(M') and from this moment doesn't accept new ops (unless leader(M) is also in M' and therefore acts also as leader(M')). When members(M') receive the phase-2 message from leader(M) they send an ack both to leader(M) and to leader(M') - in practice it can send an ack to leader(M) first, then disconnect from it and then connect to leader(M') and send an ack to it but it doesn't matter for correctness. When leader(M') receives this ack from a quorum of members(M') (as well as the phase-2 message from leader(M)) it can commit all preceding ops in M'. The thing is, until ZOOKEEPER-22 is fixed the fact that we're processing ops in M' doesn't really help since the clients won't be able to know that the ops got committed. H. I agree - it's an overkill We can make sure that a quorum of M' is connected when the reconfiguration starts. I. I agree - it is important to fix ZOOKEEPER-335 (as well as ZOOKEEPER-22 ). J. As long as a quorum of the old and the new config are up during the reconfig it seems like we can do the reconfig (e.g., the majority of 2 is 2). Currently a warning is issued if there are 2 nodes in a configuration, we can do that too. Actually, we need to define liveness more carefully, it's on my ToDo list Best Regards, Alex
          Hide
          Alexander Shraer added a comment -

          I updated the algorithm and section 4.1, and added some points for discussion as section 4.7

          https://cwiki.apache.org/confluence/display/ZOOKEEPER/ClusterMembership

          Show
          Alexander Shraer added a comment - I updated the algorithm and section 4.1, and added some points for discussion as section 4.7 https://cwiki.apache.org/confluence/display/ZOOKEEPER/ClusterMembership
          Hide
          Vishal Kher added a comment -

          Hi Alex,

          Thanks for the updated design. Some comments below. Some of these might be
          repetition of what we discussed earlier. I am adding them here so that we
          have it on the jira.

          D. The purpose of connecting is only to begin state transfer, so only
          connecting to the leader matters. for members(M) to detect failure of new
          servers that are trying to connect. If the leader fails then the new leader can
          be responsible for establishing connections to members(M') as part of
          completing the reconfiguration (see first step in Section 4.2).

          What will M' do if the leader(M) fails? We could run into a scenario where M'
          don't know about leader(M) and leader(M) may not know about M'. So it might
          be easier to ask M' to connect to M. Essentially, we need a way to figure out
          who the current leader is. We could potentially run the virtual IP based
          approach that I mentioned earlier. Let the leader runs a "cluster IP" and
          anyone (including ZK clients) that wants to get cluster/leader information can
          send query to this IP.

          the old leader can appoint one of members(M') to be the new leader (the new leader
          should be chosen s.t. it is up-to-date with M). This way leader election in M'
          is only required if the designated leader fails.

          I am not in favor of this approach. May I suggest that from implementation
          perspective, we leave this
          in future work/ improvements section? It certainly is a good optimization, but
          I don't see it giving us significant benefits. IMHO, it seems a bit counter
          intuitive for node x to designate node y as a leader in a distributed protocol
          that is designed to elect leaders. I find it easier to think "elect a leader if
          you need one". I am also concerned that this is going to make the
          implementation prone to strange corner cases and more complex (to test as well).

          E. This is very related to ZOOKEEPER-22 (how does a client know whether its
          operation has been executed if the leader failed) ?

          Agreed. However, the client library attempts to reconnect to the servers.
          Also, the application can verify if the transaction is done when it
          reconnects next time. We may have to do something similar as well.

          reconfiguration attempt and completed it or no further leader will (if we fix
          ZOOKEEPER-335 )

          Why should we not fix ZOOKEEPER-335? Won't the log divergence cause
          unpredictable outcome of reconfiguration? log of A has M' and log of B has M''.
          Depending upon who wins election the configuration will be either M' or M''.

          I think a reconfig operations should be sent to the leader only after a
          majority of M' has connected to it.

          Sure, though (as you pointed out) this is not strictly necessary for the
          correctness of the protocol.

          Here are some issues that I see with your proposal:

          1. Suppose that leader(M) fails while sending phase-3 messages and all phase-3 messages arrive to members(M) but none arrive to members(M'). Now the system is stuck: M' - (M n M') cannot serve client requests whereas members(M) are no longer in the system.
          This is why I think before garbage-collecting M we must make sure that M' is operational by itself.

          Could this be fixed by sending the message to M' first and then sending to M
          after receiving ack from majority of M'

          2. In my proposal members(M') can accept operations as soon as they get the phase-2 message, which is the purpose of phase 2.
          I don't see why phase-2 is needed in your proposal. I suggest to stick with 3 phases s.t. phase-2 message allows M' to be independent.

          3. Because of Global Primary Order property of ZAB we can't allow a commit message to arrive from leader(M) to members(M') after they are allowed to process
          messages independently in M'. Your solution handles this because members(M') close connections to leader(M) as soon as they're allowed to process messages independently, but then its not clear how the outstanding ops get committed. In order to commit these operations nevertheless so I suggest that this would be the task of leader(M').

          Here's what I suggest:

          • after leader(M) sends phase-1 message to members(M) it starts sending every received new op to both members(M) and members(M') (treating them as followers). Commit messages must not be sent to members(M') by leader(M) and sending such messages to members(M) does not hurt but is not necessary. In any case, clients are not acked regarding these ops.
          • As soon as enough acks are received for the phase-1 message from members(M), leader(M) sends a phase-2 message to members(M') and from this moment doesn't accept new ops (unless leader(M) is also in M' and therefore acts also as leader(M')).
          • When members(M') receive the phase-2 message from leader(M) they send an ack both to leader(M) and to leader(M') - in practice it can send an ack to leader(M) first, then disconnect from it and then connect to leader(M') and send an ack to it but it doesn't matter for correctness. When leader(M') receives this ack from a quorum of members(M') (as well as the phase-2 message from leader(M)) it can commit all preceding ops in M'.

          There are subtle differences between the protocol that I suggested and your
          proposal. I am not fully convinced if the changes are necessary. I might be
          missing something here. I will get in touch with you offline for further
          clarification.

          There are a few problems in the revised protocol posted on the wiki:

          1. leader(M) does not wait for confirmation of ongoing transactions from
          majority of M' (during phase 1). How do you guarantee that once M' starts
          leader election all the transactions that are committed to M are also committed
          to M? A majority of M' might be lagging behind and one of them might end up
          becoming the leader(M').

          2. Why is Step 8. ("Stop processing new operations, return failure for any
          further ops received") necessary? The protocol that I suggested does not reject
          any transactions from the clients. In the most common case of reconfiguration,
          only a subset (very likely 1) of the peers would be added/removed. So ZK
          client library can reconnect to one of the other ZK servers using the same
          code that it does today. As a result, the application will not even notice that
          a reconfiguration has happened (other than potentially receiving notifications
          about the new configuration).

          H. I agree - it's an overkill We can make sure that a quorum of M' is connected
          when the reconfiguration starts.

          What should we tell the administrator to do if majority of M' fail during
          reconfiguration? During normal operations, if a majority of nodes fail, then
          the admin has a choice to copy the DB from one of the live nodes to rest of the
          nodes and get the cluster going immediately. There is a risk of loosing some
          transactions, but there is also a chance that the one of the node has
          reasonably up-to-date copy of the data tree. However, during reconfiguration
          if majority of M' fail the cluster is unrecoverable even if majority of M are
          online. Are going to assume that the admin needs to take a backup before doing
          the reconfig?

          Thanks.
          -Vishal

          Show
          Vishal Kher added a comment - Hi Alex, Thanks for the updated design. Some comments below. Some of these might be repetition of what we discussed earlier. I am adding them here so that we have it on the jira. D. The purpose of connecting is only to begin state transfer, so only connecting to the leader matters. for members(M) to detect failure of new servers that are trying to connect. If the leader fails then the new leader can be responsible for establishing connections to members(M') as part of completing the reconfiguration (see first step in Section 4.2). What will M' do if the leader(M) fails? We could run into a scenario where M' don't know about leader(M) and leader(M) may not know about M'. So it might be easier to ask M' to connect to M. Essentially, we need a way to figure out who the current leader is. We could potentially run the virtual IP based approach that I mentioned earlier. Let the leader runs a "cluster IP" and anyone (including ZK clients) that wants to get cluster/leader information can send query to this IP. the old leader can appoint one of members(M') to be the new leader (the new leader should be chosen s.t. it is up-to-date with M). This way leader election in M' is only required if the designated leader fails. I am not in favor of this approach. May I suggest that from implementation perspective, we leave this in future work/ improvements section? It certainly is a good optimization, but I don't see it giving us significant benefits. IMHO, it seems a bit counter intuitive for node x to designate node y as a leader in a distributed protocol that is designed to elect leaders. I find it easier to think "elect a leader if you need one". I am also concerned that this is going to make the implementation prone to strange corner cases and more complex (to test as well). E. This is very related to ZOOKEEPER-22 (how does a client know whether its operation has been executed if the leader failed) ? Agreed. However, the client library attempts to reconnect to the servers. Also, the application can verify if the transaction is done when it reconnects next time. We may have to do something similar as well. reconfiguration attempt and completed it or no further leader will (if we fix ZOOKEEPER-335 ) Why should we not fix ZOOKEEPER-335 ? Won't the log divergence cause unpredictable outcome of reconfiguration? log of A has M' and log of B has M''. Depending upon who wins election the configuration will be either M' or M''. I think a reconfig operations should be sent to the leader only after a majority of M' has connected to it. Sure, though (as you pointed out) this is not strictly necessary for the correctness of the protocol. Here are some issues that I see with your proposal: 1. Suppose that leader(M) fails while sending phase-3 messages and all phase-3 messages arrive to members(M) but none arrive to members(M'). Now the system is stuck: M' - (M n M') cannot serve client requests whereas members(M) are no longer in the system. This is why I think before garbage-collecting M we must make sure that M' is operational by itself. Could this be fixed by sending the message to M' first and then sending to M after receiving ack from majority of M' 2. In my proposal members(M') can accept operations as soon as they get the phase-2 message, which is the purpose of phase 2. I don't see why phase-2 is needed in your proposal. I suggest to stick with 3 phases s.t. phase-2 message allows M' to be independent. 3. Because of Global Primary Order property of ZAB we can't allow a commit message to arrive from leader(M) to members(M') after they are allowed to process messages independently in M'. Your solution handles this because members(M') close connections to leader(M) as soon as they're allowed to process messages independently, but then its not clear how the outstanding ops get committed. In order to commit these operations nevertheless so I suggest that this would be the task of leader(M'). Here's what I suggest: after leader(M) sends phase-1 message to members(M) it starts sending every received new op to both members(M) and members(M') (treating them as followers). Commit messages must not be sent to members(M') by leader(M) and sending such messages to members(M) does not hurt but is not necessary. In any case, clients are not acked regarding these ops. As soon as enough acks are received for the phase-1 message from members(M), leader(M) sends a phase-2 message to members(M') and from this moment doesn't accept new ops (unless leader(M) is also in M' and therefore acts also as leader(M')). When members(M') receive the phase-2 message from leader(M) they send an ack both to leader(M) and to leader(M') - in practice it can send an ack to leader(M) first, then disconnect from it and then connect to leader(M') and send an ack to it but it doesn't matter for correctness. When leader(M') receives this ack from a quorum of members(M') (as well as the phase-2 message from leader(M)) it can commit all preceding ops in M'. There are subtle differences between the protocol that I suggested and your proposal. I am not fully convinced if the changes are necessary. I might be missing something here. I will get in touch with you offline for further clarification. There are a few problems in the revised protocol posted on the wiki: 1. leader(M) does not wait for confirmation of ongoing transactions from majority of M' (during phase 1). How do you guarantee that once M' starts leader election all the transactions that are committed to M are also committed to M? A majority of M' might be lagging behind and one of them might end up becoming the leader(M'). 2. Why is Step 8. ("Stop processing new operations, return failure for any further ops received") necessary? The protocol that I suggested does not reject any transactions from the clients. In the most common case of reconfiguration, only a subset (very likely 1) of the peers would be added/removed. So ZK client library can reconnect to one of the other ZK servers using the same code that it does today. As a result, the application will not even notice that a reconfiguration has happened (other than potentially receiving notifications about the new configuration). H. I agree - it's an overkill We can make sure that a quorum of M' is connected when the reconfiguration starts. What should we tell the administrator to do if majority of M' fail during reconfiguration? During normal operations, if a majority of nodes fail, then the admin has a choice to copy the DB from one of the live nodes to rest of the nodes and get the cluster going immediately. There is a risk of loosing some transactions, but there is also a chance that the one of the node has reasonably up-to-date copy of the data tree. However, during reconfiguration if majority of M' fail the cluster is unrecoverable even if majority of M are online. Are going to assume that the admin needs to take a backup before doing the reconfig? Thanks. -Vishal
          Hide
          Alexander Shraer added a comment -

          Hi Vishal,

          Thanks for the comments.

          > What will M' do if the leader(M) fails?

          It depends at what stage. Before phase 2, M' does nothing - M will elect a new leader and that leader
          will either complete the reconfiguration if it find traces of it stored by phase-1 in M, or otherwise will
          abandon it (in which case s or the client will perhaps re-submit it). As part of the recovery protocol (see twiki),
          the new leader instructs members(M') to connect to it, similarly to the way s originally instructed them to connect to
          the old leader(M).

          If leader(M) fails after a phase-2 message arrived to some process in M', that process will forward the message
          to the rest of M'. If this process fails before/during forwarding, this means that M still has a quorum and we're in the same
          case as described above. Otherwise, members(M') will get this message and run leader election. According to what
          I suggested, leader election will not be run if the appointed leader(M') is up. Anyway, I think that after getting the
          Phase-2 (activation) message, a process in M' will not agree to connect to M, instead it tries to convince the rest of M' to start running.

          > appointing a leader

          This seems like an important optimization that can be easily done since the leader has a quorum of responsive members from M', he can just pick one of them and save leader election. All this means is that before running leader election in M' a process will attempt to connect to the appointed leader(M'), and if this fails they'll still run the leader election. Admittedly, I don't yet fully understand the code,
          so I might be wrong and this might in fact turn out to be too complicated.

          > clients

          We were discussing this point with Flavio a bit, and there are some initial ideas. In any case we need some sort of DNS as a fallback, for clients that were disconnected during the reconfiguration - when they wake up there might no longer be anyone from M alive.

          > Why should we not fix ZOOKEEPER-335?

          We should, but I'm currently focusing on the main reconfiguration stuff.
          If you can work on this bug of course that would be great.

          > Could this be fixed by sending the message to M' first and then sending
          > to M after receiving ack from majority of M'

          Yes, and this is exactly how the algorithm in the twiki is structured - phase 2 contacts M'
          and only after a quorum acks, (the optional) phase 3 garbage-collects M

          > I will get in touch with you offline for further clarification.

          Sure.

          > 1. leader(M) does not wait for confirmation of ongoing transactions
          > from majority of M' (during phase 1). How do you guarantee that once M'
          > starts leader election all the transactions that are committed to M are also
          > committed to M? A majority of M' might be lagging behind and one of them might
          > end up becoming the leader(M').

          Because of FIFO and since M' are connected as followers from the beginning of the reconfiguration,
          every process in M' that gets the activation message from leader(M) had previously received all the transactions from leader(M).
          The first thing that leader(M') does when a quorum of M' connects to it is commit all these transactions in M'.
          So I think my proposal is correct. Having said that, we might want to commit the transactions in M' if we want to
          transfer clients to M' gradually, as suggested by Flavio.

          > 2. Why is Step 8. ("Stop processing new operations, return failure for
          > any further ops received") necessary?

          In the general case we cannot process operations in M once M' was activated as leader(M') may
          process operations in the new configuration M' (otherwise we may get split-brain).
          Of course, if leader(M) is in M' there is no need to stop processing operations.

          > What should we tell the administrator to do if majority of M' fail
          > during
          > reconfiguration? During normal operations, if a majority of nodes fail,
          > then
          > the admin has a choice to copy the DB from one of the live nodes to
          > rest of the
          > nodes and get the cluster going immediately. There is a risk of loosing
          > some
          > transactions, but there is also a chance that the one of the node has
          > reasonably up-to-date copy of the data tree. However, during
          > reconfiguration
          > if majority of M' fail the cluster is unrecoverable even if majority of
          > M are
          > online. Are going to assume that the admin needs to take a backup
          > before doing
          > the reconfig?

          I didn't really understand why the cluster is unrecoverable even if majority of M are online.
          I think it depends when the crash happens. If a quorum of M' cannot connect before reconfiguration begins,
          we can abort the reconfiguration. If they fail during phase-1, we can continue in M and shouldn't start M' until quorum M' is up.
          If they fail during phase-2, leader(M) will not be able to complete phase-2, so M will not be garbage-collected, although the system is stuck. If phase-2 completes then M' is active and a quorum of M' has all the state, and M can be garbage-collected. Here, I don't see the difference from a normal execution without reconfigurations.

          Best Regards,
          Alex

          Show
          Alexander Shraer added a comment - Hi Vishal, Thanks for the comments. > What will M' do if the leader(M) fails? It depends at what stage. Before phase 2, M' does nothing - M will elect a new leader and that leader will either complete the reconfiguration if it find traces of it stored by phase-1 in M, or otherwise will abandon it (in which case s or the client will perhaps re-submit it). As part of the recovery protocol (see twiki), the new leader instructs members(M') to connect to it, similarly to the way s originally instructed them to connect to the old leader(M). If leader(M) fails after a phase-2 message arrived to some process in M', that process will forward the message to the rest of M'. If this process fails before/during forwarding, this means that M still has a quorum and we're in the same case as described above. Otherwise, members(M') will get this message and run leader election. According to what I suggested, leader election will not be run if the appointed leader(M') is up. Anyway, I think that after getting the Phase-2 (activation) message, a process in M' will not agree to connect to M, instead it tries to convince the rest of M' to start running. > appointing a leader This seems like an important optimization that can be easily done since the leader has a quorum of responsive members from M', he can just pick one of them and save leader election. All this means is that before running leader election in M' a process will attempt to connect to the appointed leader(M'), and if this fails they'll still run the leader election. Admittedly, I don't yet fully understand the code, so I might be wrong and this might in fact turn out to be too complicated. > clients We were discussing this point with Flavio a bit, and there are some initial ideas. In any case we need some sort of DNS as a fallback, for clients that were disconnected during the reconfiguration - when they wake up there might no longer be anyone from M alive. > Why should we not fix ZOOKEEPER-335 ? We should, but I'm currently focusing on the main reconfiguration stuff. If you can work on this bug of course that would be great. > Could this be fixed by sending the message to M' first and then sending > to M after receiving ack from majority of M' Yes, and this is exactly how the algorithm in the twiki is structured - phase 2 contacts M' and only after a quorum acks, (the optional) phase 3 garbage-collects M > I will get in touch with you offline for further clarification. Sure. > 1. leader(M) does not wait for confirmation of ongoing transactions > from majority of M' (during phase 1). How do you guarantee that once M' > starts leader election all the transactions that are committed to M are also > committed to M? A majority of M' might be lagging behind and one of them might > end up becoming the leader(M'). Because of FIFO and since M' are connected as followers from the beginning of the reconfiguration, every process in M' that gets the activation message from leader(M) had previously received all the transactions from leader(M). The first thing that leader(M') does when a quorum of M' connects to it is commit all these transactions in M'. So I think my proposal is correct. Having said that, we might want to commit the transactions in M' if we want to transfer clients to M' gradually, as suggested by Flavio. > 2. Why is Step 8. ("Stop processing new operations, return failure for > any further ops received") necessary? In the general case we cannot process operations in M once M' was activated as leader(M') may process operations in the new configuration M' (otherwise we may get split-brain). Of course, if leader(M) is in M' there is no need to stop processing operations. > What should we tell the administrator to do if majority of M' fail > during > reconfiguration? During normal operations, if a majority of nodes fail, > then > the admin has a choice to copy the DB from one of the live nodes to > rest of the > nodes and get the cluster going immediately. There is a risk of loosing > some > transactions, but there is also a chance that the one of the node has > reasonably up-to-date copy of the data tree. However, during > reconfiguration > if majority of M' fail the cluster is unrecoverable even if majority of > M are > online. Are going to assume that the admin needs to take a backup > before doing > the reconfig? I didn't really understand why the cluster is unrecoverable even if majority of M are online. I think it depends when the crash happens. If a quorum of M' cannot connect before reconfiguration begins, we can abort the reconfiguration. If they fail during phase-1, we can continue in M and shouldn't start M' until quorum M' is up. If they fail during phase-2, leader(M) will not be able to complete phase-2, so M will not be garbage-collected, although the system is stuck. If phase-2 completes then M' is active and a quorum of M' has all the state, and M can be garbage-collected. Here, I don't see the difference from a normal execution without reconfigurations. Best Regards, Alex
          Hide
          Vishal Kher added a comment -

          We were discussing this point with Flavio a bit, and there are some initial ideas. In any case we need some sort of DNS as a fallback, for clients that were disconnected during the reconfiguration - when they wake up there might no longer be anyone from M alive.

          servers will need it too. A peer may be part of M', but may not know about it.

          The mechanism for this should be internal to the cluster. If this requires making entries in DNS, then in practice no administrator will allow that. To run something internal to the cluster, you will need a fail safe way to access that central resource. As we discussed, using cluster IP (ZOOKEEPER-1031) and running the resource on the node with cluster ip will help in this scenario. We shouldn't rely on the ip to be running to bootsrap the cluster (to form a quorum) for obvious reasons.

          process operations in the new configuration M' (otherwise we may get split-brain).

          Earlier when I had implemented a membership change algorithm, I rejected reconfig(M') if a majority of M did not belong to majority of M'. In short, change at most a majority - 1 number of nodes in one reconfig. It is not a strict requirement here, but I think it is worth a thought.

          Because of FIFO and since M' are connected as followers from the beginning of the reconfiguration,

          Ah! this was the missing link. So M' are considered as followers. I missed that in our description on the twiki. From "New operations (received by leader(M) during phase-1) will be sent to both M and M'" it looked like the leader is just sending messages to M', but not waiting for a majority to ack.

          Having said that, we might want to commit the transactions in M' if we want to
          transfer clients to M' gradually, as suggested by Flavio.

          Thats a good point. One simple way to do this is to let s use incremental API if there are too many clients. Replacing one by one will result in gradual transfer of clients.

          Also, after step 7 of the algorithm (before phase 2) how about we ask leader(M) to do a transactions using zab (or consider reconfig-start as that transaction) to write out M' in zookeeper tree. Each server can then send a notification to its clients to indicate a change in progress. Clients can disconnect from M and start connecting to M'.

          Here, I don't see the difference from a normal execution without reconfigurations.

          Agreed.

          Thanks.

          Show
          Vishal Kher added a comment - We were discussing this point with Flavio a bit, and there are some initial ideas. In any case we need some sort of DNS as a fallback, for clients that were disconnected during the reconfiguration - when they wake up there might no longer be anyone from M alive. servers will need it too. A peer may be part of M', but may not know about it. The mechanism for this should be internal to the cluster. If this requires making entries in DNS, then in practice no administrator will allow that. To run something internal to the cluster, you will need a fail safe way to access that central resource. As we discussed, using cluster IP ( ZOOKEEPER-1031 ) and running the resource on the node with cluster ip will help in this scenario. We shouldn't rely on the ip to be running to bootsrap the cluster (to form a quorum) for obvious reasons. process operations in the new configuration M' (otherwise we may get split-brain). Earlier when I had implemented a membership change algorithm, I rejected reconfig(M') if a majority of M did not belong to majority of M'. In short, change at most a majority - 1 number of nodes in one reconfig. It is not a strict requirement here, but I think it is worth a thought. Because of FIFO and since M' are connected as followers from the beginning of the reconfiguration, Ah! this was the missing link. So M' are considered as followers. I missed that in our description on the twiki. From "New operations (received by leader(M) during phase-1) will be sent to both M and M'" it looked like the leader is just sending messages to M', but not waiting for a majority to ack. Having said that, we might want to commit the transactions in M' if we want to transfer clients to M' gradually, as suggested by Flavio. Thats a good point. One simple way to do this is to let s use incremental API if there are too many clients. Replacing one by one will result in gradual transfer of clients. Also, after step 7 of the algorithm (before phase 2) how about we ask leader(M) to do a transactions using zab (or consider reconfig-start as that transaction) to write out M' in zookeeper tree. Each server can then send a notification to its clients to indicate a change in progress. Clients can disconnect from M and start connecting to M'. Here, I don't see the difference from a normal execution without reconfigurations. Agreed. Thanks.
          Hide
          Quinton Hoole added a comment -

          Yup, thanks, I'm also on this thread. Need to catch up on some reading though ;-

          Q

          Show
          Quinton Hoole added a comment - Yup, thanks, I'm also on this thread. Need to catch up on some reading though ;- Q
          Hide
          Vishal Kher added a comment -

          Hi Alex,

          Getting back to logistics, can you please set target date for the availability of this feature? It need not be bullet proof and a simpler version that will enable us to add/delete a peer will also be a big step forward.

          Thanks.
          -Vishal

          Show
          Vishal Kher added a comment - Hi Alex, Getting back to logistics, can you please set target date for the availability of this feature? It need not be bullet proof and a simpler version that will enable us to add/delete a peer will also be a big step forward. Thanks. -Vishal
          Hide
          Hari A V added a comment -

          Hi all,

          Considering ZOOKEEPER-646, where it tries to bring the write throughput (although the issue is also not in progress), dynamically adding ZK Server to cluster got one more usecase - Increase the write throughput of your cluster dynamically which may be very interesting for a Consistent Data Storage.

          • Hari
          Show
          Hari A V added a comment - Hi all, Considering ZOOKEEPER-646 , where it tries to bring the write throughput (although the issue is also not in progress), dynamically adding ZK Server to cluster got one more usecase - Increase the write throughput of your cluster dynamically which may be very interesting for a Consistent Data Storage. Hari
          Hide
          yogesh kumar jha added a comment -

          Hi all,

          Can any1 update the current status of the solution to this issue and how much progress has been done?

          Show
          yogesh kumar jha added a comment - Hi all, Can any1 update the current status of the solution to this issue and how much progress has been done?
          Hide
          Alexander Shraer added a comment -

          Hi,

          I'm working on this feature and hope to get it into the 3.5.0 release. The main functionality already works, I intend to update the twiki with details soon.

          Alex

          Show
          Alexander Shraer added a comment - Hi, I'm working on this feature and hope to get it into the 3.5.0 release. The main functionality already works, I intend to update the twiki with details soon. Alex
          Hide
          yogesh kumar jha added a comment -

          Thanx for reply, Alex. Hope to get the update soon...

          Show
          yogesh kumar jha added a comment - Thanx for reply, Alex. Hope to get the update soon...
          Hide
          Marshall McMullen added a comment -

          Alex, Curious what the status is of this now that a couple more months have gone by. The company I work for would really love to have dynamic cluster membership and we're willing to help out in any way we can to get this across the finish line. I wrote all the ZKMulti code so I'm pretty familiar with zookeeper internals. I'd love to pitch in and help out in any way possible to get this implemented as soon as possible. Or if you need any help testing or debugging I'd be happy to assist there as well. Let me know what you think...

          --Marshall

          Show
          Marshall McMullen added a comment - Alex, Curious what the status is of this now that a couple more months have gone by. The company I work for would really love to have dynamic cluster membership and we're willing to help out in any way we can to get this across the finish line. I wrote all the ZKMulti code so I'm pretty familiar with zookeeper internals. I'd love to pitch in and help out in any way possible to get this implemented as soon as possible. Or if you need any help testing or debugging I'd be happy to assist there as well. Let me know what you think... --Marshall
          Hide
          Alexander Shraer added a comment -

          Hi Marshall,

          I'm pretty much done with the server-side implementation and changed the client java & C API and shells. I would definitely appreciate any help in writing tests for this. I can send you a patch with all my changes and I also have some preliminary writeup explaining the changes. I'm currently doing some testing trying to see the performance impact of reconfiguration (there is no impact unless it is actually invoked). There is still some work needed on the client-side of reconfiguration, but I don't think we fully converged on what we want to do yet (there were some discussions between Vishal K., Ben, Flavio and myself).

          Thanks,
          Alex

          Show
          Alexander Shraer added a comment - Hi Marshall, I'm pretty much done with the server-side implementation and changed the client java & C API and shells. I would definitely appreciate any help in writing tests for this. I can send you a patch with all my changes and I also have some preliminary writeup explaining the changes. I'm currently doing some testing trying to see the performance impact of reconfiguration (there is no impact unless it is actually invoked). There is still some work needed on the client-side of reconfiguration, but I don't think we fully converged on what we want to do yet (there were some discussions between Vishal K., Ben, Flavio and myself). Thanks, Alex
          Hide
          Mahadev konar added a comment -

          alex,
          I'd suggest uploading the preliminary patch on the jira. This way folks can try it out and also give early feedback. I suspect this will be a huge change and the more eyes on it will make it better.

          Show
          Mahadev konar added a comment - alex, I'd suggest uploading the preliminary patch on the jira. This way folks can try it out and also give early feedback. I suspect this will be a huge change and the more eyes on it will make it better.
          Hide
          Patrick Hunt added a comment -

          Uploading the patch to reviews.apache.org would enable more feedback as well.

          Show
          Patrick Hunt added a comment - Uploading the patch to reviews.apache.org would enable more feedback as well.
          Hide
          Marshall McMullen added a comment -

          Alex, That's awesome news! Really glad to see that you're pretty much done with the server-side implementation. I would be extremely happy to help with writing tests and help fix any server or client side bugs that crop up. My company is extremely interested in this and they look to me to be the zookeeper expert...so I definitely want to get involved in this as soon as possible to flush out all the details and get it integrated into our environment and start doing extensive testing.

          Please do send me a patch and any preliminary write-up you have and I'll dive in and start testing the hell out of it.

          Or as Patrick suggests, upload the patch to this issue so others can look at it as well.

          Show
          Marshall McMullen added a comment - Alex, That's awesome news! Really glad to see that you're pretty much done with the server-side implementation. I would be extremely happy to help with writing tests and help fix any server or client side bugs that crop up. My company is extremely interested in this and they look to me to be the zookeeper expert...so I definitely want to get involved in this as soon as possible to flush out all the details and get it integrated into our environment and start doing extensive testing. Please do send me a patch and any preliminary write-up you have and I'll dive in and start testing the hell out of it. Or as Patrick suggests, upload the patch to this issue so others can look at it as well.
          Hide
          Alexander Shraer added a comment -

          Thanks all for the suggestions. I hope to upload a patch later this week.

          Show
          Alexander Shraer added a comment - Thanks all for the suggestions. I hope to upload a patch later this week.
          Hide
          Alexander Shraer added a comment -

          patch for preliminary review

          Show
          Alexander Shraer added a comment - patch for preliminary review
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12493985/zoo_replicated1.members
          against trunk revision 1166970.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/522//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12493985/zoo_replicated1.members against trunk revision 1166970. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/522//console This message is automatically generated.
          Hide
          Alexander Shraer added a comment -

          in the patch, configuration file was split to 2 parts - attached are sample files zoo_replicated1.members and zoo_replicated1.cfg.

          the cfg file points to the members file, and the members file has all information about membership and ports. You must specify all client ports in that file - this info is separated by ";" from the usual server info (client ports were made part of the configuration info so that when clients find out about new configurations they know how to connect to the servers). The cfg file can still include client ports (or hostname and port, but if given they must be the same as in the members file.

          There are 2 new commands - "config" that gets the new configuration and "reconfig" that changes the configuration. config has just one option "-c" which returns just the version of the configuration and connection string interesting to clients. reconfig has many options and you can change all ports and server roles in the members file. For example in the ZK shell:

          > reconfig -remove 1 -add 5=localhost:1234:1235:observer;1236
          > reconfig -file new.members
          > reconfig -remove 3 -from 123456 (reconfigure only if the current config is 123456)

          By doing "-add " and giving different ports/role to a server that is already in the cluster, you'll change its ports/role.

          Clients can subscribe to membership changes (membership is stored in a special znode, ZooDefs.CONFIG_NODE), and update the Zookeeper handle by executing
          "zk.updateServerList(hostList);" See src/java/systest/ReconfigClient.java or GenerateLoad.java for an example.

          I intend to update the algorithm to get better load balancing when new servers are added to the system (current method spreads load well only when a servers are removed).

          Alex

          Show
          Alexander Shraer added a comment - in the patch, configuration file was split to 2 parts - attached are sample files zoo_replicated1.members and zoo_replicated1.cfg. the cfg file points to the members file, and the members file has all information about membership and ports. You must specify all client ports in that file - this info is separated by ";" from the usual server info (client ports were made part of the configuration info so that when clients find out about new configurations they know how to connect to the servers). The cfg file can still include client ports (or hostname and port, but if given they must be the same as in the members file. There are 2 new commands - "config" that gets the new configuration and "reconfig" that changes the configuration. config has just one option "-c" which returns just the version of the configuration and connection string interesting to clients. reconfig has many options and you can change all ports and server roles in the members file. For example in the ZK shell: > reconfig -remove 1 -add 5=localhost:1234:1235:observer;1236 > reconfig -file new.members > reconfig -remove 3 -from 123456 (reconfigure only if the current config is 123456) By doing "-add " and giving different ports/role to a server that is already in the cluster, you'll change its ports/role. Clients can subscribe to membership changes (membership is stored in a special znode, ZooDefs.CONFIG_NODE), and update the Zookeeper handle by executing "zk.updateServerList(hostList);" See src/java/systest/ReconfigClient.java or GenerateLoad.java for an example. I intend to update the algorithm to get better load balancing when new servers are added to the system (current method spreads load well only when a servers are removed). Alex
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12493987/zookeeper-dev-fatjar.jar
          against trunk revision 1166970.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 390 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/523//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12493987/zookeeper-dev-fatjar.jar against trunk revision 1166970. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 390 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/523//console This message is automatically generated.
          Hide
          Marshall McMullen added a comment -

          Alex - AWESOME!! So glad to see a patch posted here. I've been very anxiously awaiting this and the company I work for is very eager to start working on integrating this into our environment and start testing it. I'll start working on integrating this and working on some unit tests this week. Very nice job. Only looked over the code in the patch briefly but it looks very very promising. So glad to see this finally coming together. Be in touch soon regarding testing...

          -Marshall

          Show
          Marshall McMullen added a comment - Alex - AWESOME!! So glad to see a patch posted here. I've been very anxiously awaiting this and the company I work for is very eager to start working on integrating this into our environment and start testing it. I'll start working on integrating this and working on some unit tests this week. Very nice job. Only looked over the code in the patch briefly but it looks very very promising. So glad to see this finally coming together. Be in touch soon regarding testing... -Marshall
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12494127/zookeeper-reconfig-sep12.patch
          against trunk revision 1166970.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 54 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          -1 javac. The applied patch generated 11 javac compiler warnings (more than the trunk's current 10 warnings).

          -1 findbugs. The patch appears to introduce 16 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/529//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/529//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/529//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12494127/zookeeper-reconfig-sep12.patch against trunk revision 1166970. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 54 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 2 warning messages. -1 javac. The applied patch generated 11 javac compiler warnings (more than the trunk's current 10 warnings). -1 findbugs. The patch appears to introduce 16 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/529//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/529//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/529//console This message is automatically generated.
          Hide
          Patrick Hunt added a comment -

          cancelling the patch for now based on QA bot's found issues.

          Show
          Patrick Hunt added a comment - cancelling the patch for now based on QA bot's found issues.
          Hide
          Eugene Koontz added a comment -

          Would be great to have both ZOOKEEPER-107 and ZOOKEEPER-1045 working together: new servers would be able to mutually authenticate with an existing quorum upon joining it.

          Show
          Eugene Koontz added a comment - Would be great to have both ZOOKEEPER-107 and ZOOKEEPER-1045 working together: new servers would be able to mutually authenticate with an existing quorum upon joining it.
          Hide
          Marshall McMullen added a comment -

          Any chance there's an updated version of this patch against the latest zookeeper 3.4.3 ?

          I've finally been given time to get this integrated into our product at the company I work for and am anxious to finally pick it up and work on zookeeper again...Would appreciate an updated patch if anyone has one.

          Thanks!

          Show
          Marshall McMullen added a comment - Any chance there's an updated version of this patch against the latest zookeeper 3.4.3 ? I've finally been given time to get this integrated into our product at the company I work for and am anxious to finally pick it up and work on zookeeper again...Would appreciate an updated patch if anyone has one. Thanks!
          Hide
          Alexander Shraer added a comment -

          Marshall, that's great!!

          I'll prepare a patch later today or tomorrow. There's also the ZK-1355 patch, which is part of this feature. In general the plan is to break this ZK-107 patch into several smaller ones and integrate them separately, starting with ZK-1355. One thing you could help with is the C client change for ZK-1355 - this is a very self contained thing you could do if you have the time (the Java version is already there, there's only a small change to it I intend to make following a comment from Flavio). There are other things you could help with if you have time.

          Alex

          Show
          Alexander Shraer added a comment - Marshall, that's great!! I'll prepare a patch later today or tomorrow. There's also the ZK-1355 patch, which is part of this feature. In general the plan is to break this ZK-107 patch into several smaller ones and integrate them separately, starting with ZK-1355. One thing you could help with is the C client change for ZK-1355 - this is a very self contained thing you could do if you have the time (the Java version is already there, there's only a small change to it I intend to make following a comment from Flavio). There are other things you could help with if you have time. Alex
          Hide
          Marshall McMullen added a comment -

          Alex, if you can get a patch for ZK-107 and ZK-1355 that apply cleanly to zookeeper-3.4.3, I'll absolutely commit to doing the C Client changes for ZK-1355. I've done a ton of work in the C client so know it inside and out (I did the multi-op support).

          Looking forward to a new patch and I'll get started.

          Show
          Marshall McMullen added a comment - Alex, if you can get a patch for ZK-107 and ZK-1355 that apply cleanly to zookeeper-3.4.3, I'll absolutely commit to doing the C Client changes for ZK-1355. I've done a ton of work in the C client so know it inside and out (I did the multi-op support). Looking forward to a new patch and I'll get started.
          Hide
          Alexander Shraer added a comment -

          This is not intended for inclusion at this point, just the current version that people
          can play with.

          Show
          Alexander Shraer added a comment - This is not intended for inclusion at this point, just the current version that people can play with.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516482/ZOOKEEPER-107-28-Feb.patch
          against trunk revision 1294000.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 42 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/968//testReport/
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/968//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516482/ZOOKEEPER-107-28-Feb.patch against trunk revision 1294000. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 42 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/968//testReport/ Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/968//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516486/ZOOKEEPER-107-29-Feb.patch
          against trunk revision 1294000.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 42 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/969//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/969//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/969//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516486/ZOOKEEPER-107-29-Feb.patch against trunk revision 1294000. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 42 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 2 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/969//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/969//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/969//console This message is automatically generated.
          Hide
          Alexander Shraer added a comment -

          not for inclusion.

          • added backward compatibility mode for configuration files: the feature generally requires that the config file points to a separate membership file (where the server, group and weight definitions are supposed to be), but will now also work with a standard single config file. In that case, if a reconfiguration happens, we create a membership file and edit the config file to point to the new membership file. Otherwise the config file remains unchanged.
          • Fixed small problem in C client
          Show
          Alexander Shraer added a comment - not for inclusion. added backward compatibility mode for configuration files: the feature generally requires that the config file points to a separate membership file (where the server, group and weight definitions are supposed to be), but will now also work with a standard single config file. In that case, if a reconfiguration happens, we create a membership file and edit the config file to point to the new membership file. Otherwise the config file remains unchanged. Fixed small problem in C client
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516769/ZOOKEEPER-107-1-Mar.patch
          against trunk revision 1294000.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 42 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/970//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/970//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/970//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516769/ZOOKEEPER-107-1-Mar.patch against trunk revision 1294000. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 42 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 2 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/970//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/970//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/970//console This message is automatically generated.
          Hide
          Marshall McMullen added a comment -

          Latest Patch (1 Mar) applies cleanly for me and resolved the issues we discussed earlier. Thanks Alex.

          Show
          Marshall McMullen added a comment - Latest Patch (1 Mar) applies cleanly for me and resolved the issues we discussed earlier. Thanks Alex.
          Hide
          Marshall McMullen added a comment -

          Alex, below are some comments I have on the C code only.

          1. Use C naming conventions. Specifically:

          zhandle_t zh, char **joiningServers, int numJoiners, char **leavingServers, int numLeaving, char *newMembers, int newMembersLen, int64_t fromConfig, char *buffer, int buffer_len

          This should be:

          zhandle_t zh, char **joining_servers, int num_joiners, char **leaving_servers, int num_leaving, char *new_members, int new_membersLen, int64_t from_config, char *buffer, int buffer_len

          2. The C code tries very hard to keep the line lengths at 80 characters. A lot of your code does not.

          3. I think that joiningServers and leavingServers should be const as certainly you aren't changing these in the C code. And it wouldn't make sense for the contract of these functions if you ever needed to. It also makes it easier to call from C++ code where I can create a vector then pass it's contents into the C code without having to const-cast things.

          4. newMembers is badly named since it's a FILE. Should be e.g. newMembershipFile or something

          5. newMembersLen is also badly named as it's the length of the filename not the length of the new members themselves.

          6. fromConfig should have more meaningful name like old_version.

          7. I totally don't understand the purpose of buffer and buffer_len arguments. Why would they ever be anything other than the list I provided? This seems unecessary. Moreover, I don't see why you'd expose the raw contents of this buffer to the caller.

          8. What's the purpose of the stat argument? Not sure I understand the use case for why I'd want the stat structure of the underlying /zookeeper/config path. Or is this just for getting the version of that znode?

          9. I find the combination of the non-incremental and incremental arguments into the same function overly complicated. Specifically if you're giving the "newMembership" arguments the rest are ignored (or must be NULL) but if you're in incremental mode, the non-incremental args must be NULL. Easy to get that wrong or at least be confused by this. I think it'd be cleaner to have a separate interface for the nonincremental mode... perhaps a zoo_reconfig and a zoo_reconfig_nonincremental or zoo_reconfig_fromfile or some such.

          Since I'm doing a lot of improvements to the C code already, let me know your thoughts on the above and I'll make the necessary changes.

          Show
          Marshall McMullen added a comment - Alex, below are some comments I have on the C code only. 1. Use C naming conventions. Specifically: zhandle_t zh, char **joiningServers, int numJoiners, char **leavingServers, int numLeaving, char *newMembers, int newMembersLen, int64_t fromConfig, char *buffer, int buffer_len This should be: zhandle_t zh, char **joining_servers, int num_joiners, char **leaving_servers, int num_leaving, char *new_members, int new_membersLen, int64_t from_config, char *buffer, int buffer_len 2. The C code tries very hard to keep the line lengths at 80 characters. A lot of your code does not. 3. I think that joiningServers and leavingServers should be const as certainly you aren't changing these in the C code. And it wouldn't make sense for the contract of these functions if you ever needed to. It also makes it easier to call from C++ code where I can create a vector then pass it's contents into the C code without having to const-cast things. 4. newMembers is badly named since it's a FILE. Should be e.g. newMembershipFile or something 5. newMembersLen is also badly named as it's the length of the filename not the length of the new members themselves. 6. fromConfig should have more meaningful name like old_version. 7. I totally don't understand the purpose of buffer and buffer_len arguments. Why would they ever be anything other than the list I provided? This seems unecessary. Moreover, I don't see why you'd expose the raw contents of this buffer to the caller. 8. What's the purpose of the stat argument? Not sure I understand the use case for why I'd want the stat structure of the underlying /zookeeper/config path. Or is this just for getting the version of that znode? 9. I find the combination of the non-incremental and incremental arguments into the same function overly complicated. Specifically if you're giving the "newMembership" arguments the rest are ignored (or must be NULL) but if you're in incremental mode, the non-incremental args must be NULL. Easy to get that wrong or at least be confused by this. I think it'd be cleaner to have a separate interface for the nonincremental mode... perhaps a zoo_reconfig and a zoo_reconfig_nonincremental or zoo_reconfig_fromfile or some such. Since I'm doing a lot of improvements to the C code already, let me know your thoughts on the above and I'll make the necessary changes.
          Hide
          Alexander Shraer added a comment -

          Hi Marshall,

          Thanks for the comments. Regarding 7 - zoo_getconfig was made to be very similar to zoo_get, except that it doesn't get a path as parameter and reads the special configuration znode. As the comments in the code say, the buffer holds the configuration returned by the server and buffer_len is the size of the "returned" buffer. Similarly the answer to 8 is that it was made to be similar to zoo_get. The configuration version is actually part of the returned config data itself, although (if I remember correctly) it should be identical to the znode version. Its something that is also stored in the membership file and used for comparing configs, so its always part of the config.

          I would be happy if you fix the other issues you raised. Would you like to update the last patch here and upload it ?
          I'd actually be happy to pass the C part to you entirely, I'm just not sure how that could work in JIRA.

          Thanks,
          Alex

          P.S. this JIRA is not for inclusion. Once 1411 gets in I'd like to open another JIRA with the part in the 107 patch missing from 1411 (the actual reconfiguration algorithm)

          Show
          Alexander Shraer added a comment - Hi Marshall, Thanks for the comments. Regarding 7 - zoo_getconfig was made to be very similar to zoo_get, except that it doesn't get a path as parameter and reads the special configuration znode. As the comments in the code say, the buffer holds the configuration returned by the server and buffer_len is the size of the "returned" buffer. Similarly the answer to 8 is that it was made to be similar to zoo_get. The configuration version is actually part of the returned config data itself, although (if I remember correctly) it should be identical to the znode version. Its something that is also stored in the membership file and used for comparing configs, so its always part of the config. I would be happy if you fix the other issues you raised. Would you like to update the last patch here and upload it ? I'd actually be happy to pass the C part to you entirely, I'm just not sure how that could work in JIRA. Thanks, Alex P.S. this JIRA is not for inclusion. Once 1411 gets in I'd like to open another JIRA with the part in the 107 patch missing from 1411 (the actual reconfiguration algorithm)
          Hide
          Marshall McMullen added a comment -

          Thanks for the clarification Alex. After your explanations I agree with everything you've said.

          I'll make the agreed upon changes and upload a new patch some time tomorrow.

          I'm happy to maintain and work on the C part regardless of how things look in JIRA. I can always just download your latest changes, and add any necessary C changes, then re-upload the patch.

          Show
          Marshall McMullen added a comment - Thanks for the clarification Alex. After your explanations I agree with everything you've said. I'll make the agreed upon changes and upload a new patch some time tomorrow. I'm happy to maintain and work on the C part regardless of how things look in JIRA. I can always just download your latest changes, and add any necessary C changes, then re-upload the patch.
          Hide
          Alexander Shraer added a comment -

          I removed the contents of ZK-1411 (which is now in trunk) from the 107 patch. The current patch applies to new trunk, but requires more work (partly to make sure that I use Commons CLI correctly for reconfig / config commands).

          Notice that to refer to the dynamic configuration file from the static one you should write "dynamicConfigFile=..." and not "membershipFile=..." as with the previous patches posted here.

          Show
          Alexander Shraer added a comment - I removed the contents of ZK-1411 (which is now in trunk) from the 107 patch. The current patch applies to new trunk, but requires more work (partly to make sure that I use Commons CLI correctly for reconfig / config commands). Notice that to refer to the dynamic configuration file from the static one you should write "dynamicConfigFile=..." and not "membershipFile=..." as with the previous patches posted here.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12523765/ZOOKEEPER-107-22-Apr.patch
          against trunk revision 1328991.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1048//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1048//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1048//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12523765/ZOOKEEPER-107-22-Apr.patch against trunk revision 1328991. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 2 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1048//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1048//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1048//console This message is automatically generated.
          Hide
          Alexander Shraer added a comment -

          Attaching a paper about this work that will appear in USENIX Annual Technical Conference 2012.

          Alex

          Show
          Alexander Shraer added a comment - Attaching a paper about this work that will appear in USENIX Annual Technical Conference 2012. Alex
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12525379/zkreconfig-usenixatc-final.pdf
          against trunk revision 1331246.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1058//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12525379/zkreconfig-usenixatc-final.pdf against trunk revision 1331246. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1058//console This message is automatically generated.
          Hide
          Alexander Shraer added a comment -

          org.apache.zookeeper.test.ReconfigTest includes a lot of reconfiguration tests. There are still a few tests I need to add.

          Show
          Alexander Shraer added a comment - org.apache.zookeeper.test.ReconfigTest includes a lot of reconfiguration tests. There are still a few tests I need to add.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12537416/ZOOKEEPER-107-20-July.patch
          against trunk revision 1362660.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 38 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1141//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1141//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1141//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537416/ZOOKEEPER-107-20-July.patch against trunk revision 1362660. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 38 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 2 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1141//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1141//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1141//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12537463/ZOOKEEPER-107-21-July.patch
          against trunk revision 1362660.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 26 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1143//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1143//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1143//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12537463/ZOOKEEPER-107-21-July.patch against trunk revision 1362660. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 26 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1143//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1143//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1143//console This message is automatically generated.
          Hide
          Flavio Junqueira added a comment -

          About the test failure, it seems to be interpreting a file as a directory:

          directory /tmp/test8632392252062014508dir/version-2/currentEpoch.tmp listFiles returned null

          I've seen a few complaints online here and there about isFile() not evaluating correctly, but unclear on why it is not correct in this case. It doesn't seem to have unusual characters or anything. I can't tell why it is failing just for this patch. Is there anything special you're doing with that currentEpoch file, Alex?

          Show
          Flavio Junqueira added a comment - About the test failure, it seems to be interpreting a file as a directory: directory /tmp/test8632392252062014508dir/version-2/currentEpoch.tmp listFiles returned null I've seen a few complaints online here and there about isFile() not evaluating correctly, but unclear on why it is not correct in this case. It doesn't seem to have unusual characters or anything. I can't tell why it is failing just for this patch. Is there anything special you're doing with that currentEpoch file, Alex?
          Hide
          Alexander Shraer added a comment -

          Hi Flavio,

          I haven't touched currentEpoch. But seems like this file shouldn't exist at all. If I'm not mistaken it is created in QuorumPeer.writeLongToFile (a recent addition that writes atomically) called from QuorumPeer.setCurrentEpoch and in the comments there its says that the tmp file will exists only if the atomic write failed. I'm not sure why this happens and actually I couldn't find the corresponding error message in the hudson execution logs (maybe I wasn't looking in the right place).

          Alex

          Show
          Alexander Shraer added a comment - Hi Flavio, I haven't touched currentEpoch. But seems like this file shouldn't exist at all. If I'm not mistaken it is created in QuorumPeer.writeLongToFile (a recent addition that writes atomically) called from QuorumPeer.setCurrentEpoch and in the comments there its says that the tmp file will exists only if the atomic write failed. I'm not sure why this happens and actually I couldn't find the corresponding error message in the hudson execution logs (maybe I wasn't looking in the right place). Alex
          Hide
          Patrick Hunt added a comment -

          the listFiles is probably related to this? ZOOKEEPER-1522

          likely a timing issue where the file/directory is deleted out from under you.

          Show
          Patrick Hunt added a comment - the listFiles is probably related to this? ZOOKEEPER-1522 likely a timing issue where the file/directory is deleted out from under you.
          Hide
          Flavio Junqueira added a comment -

          Pat, Check my last comment. For some reasons it thinks that currentEpoch.tmp (which I believe is a file) is a directory and tries to list the files. If not that, then I don't understand this output message:

          directory /tmp/test8632392252062014508dir/version-2/currentEpoch.tmp listFiles returned null
          
          Show
          Flavio Junqueira added a comment - Pat, Check my last comment. For some reasons it thinks that currentEpoch.tmp (which I believe is a file) is a directory and tries to list the files. If not that, then I don't understand this output message: directory /tmp/test8632392252062014508dir/version-2/currentEpoch.tmp listFiles returned null
          Hide
          Patrick Hunt added a comment -

          Flavio, here's what I think is happening. listFiles returns null when the File doesn't exist. Also check isFile - it returns true iif the File is a file and it exists. So in this case the file existed, then deleted out from under the thread, which called isFile which returned false, so then it tried to listFiles which returns null because there is no File. See my patch in ZOOKEEPER-1522 which addresses this issue.

          Show
          Patrick Hunt added a comment - Flavio, here's what I think is happening. listFiles returns null when the File doesn't exist. Also check isFile - it returns true iif the File is a file and it exists. So in this case the file existed, then deleted out from under the thread, which called isFile which returned false, so then it tried to listFiles which returns null because there is no File. See my patch in ZOOKEEPER-1522 which addresses this issue.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12541707/ZOOKEEPER-107-Aug-20.patch
          against trunk revision 1373156.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 35 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1163//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1163//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1163//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541707/ZOOKEEPER-107-Aug-20.patch against trunk revision 1373156. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 35 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1163//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1163//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1163//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12541712/ZOOKEEPER-107-Aug-20-ver1.patch
          against trunk revision 1373156.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 35 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1164//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1164//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1164//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541712/ZOOKEEPER-107-Aug-20-ver1.patch against trunk revision 1373156. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 35 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1164//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1164//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1164//console This message is automatically generated.
          Hide
          Alexander Shraer added a comment -

          I'm not sure why these 3 tests fail, seems like some timing issue. They pass locally for me.

          Show
          Alexander Shraer added a comment - I'm not sure why these 3 tests fail, seems like some timing issue. They pass locally for me.
          Hide
          Alexander Shraer added a comment -

          minor refactoring

          Show
          Alexander Shraer added a comment - minor refactoring
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12542444/ZOOKEEPER-107-Aug-25.patch
          against trunk revision 1373156.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 33 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1171//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1171//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1171//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542444/ZOOKEEPER-107-Aug-25.patch against trunk revision 1373156. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 33 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1171//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1171//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1171//console This message is automatically generated.
          Hide
          Benjamin Reed added a comment -

          looks good. a couple of small comments:

          1) you need a testcase for async reconfig. it appears that ReconfigCallback is not used.
          2) need to comment backward compatibility processing in FastLeaderElection.
          3) the check for short packet in the run() of FastLeaderElection needs to be moved up before any packet processing takes place.
          4) in LearnerHandler some of the info messages should be debug
          5) in LearnerHandler, newLeaderData was changed to be 12 bytes, but only first 4 are used like before. why?
          6) in Leader, it doesn't look like SHUTDOWN is fully there, we should remove it
          7) there are warns and errors in Leader and QuorumPeer that looks like it needs to be a debug
          8) add comment in processAck to explain why multiple outstanding proposals may be ready to commit after reconfig
          9) In QuorumUtil, please put a comment with the jira number which relates to the JMX broken code

          great set of tests! i think we have excellent coverage of scenarios. we just need to do a simple test for 1) to make sure it works.

          Show
          Benjamin Reed added a comment - looks good. a couple of small comments: 1) you need a testcase for async reconfig. it appears that ReconfigCallback is not used. 2) need to comment backward compatibility processing in FastLeaderElection. 3) the check for short packet in the run() of FastLeaderElection needs to be moved up before any packet processing takes place. 4) in LearnerHandler some of the info messages should be debug 5) in LearnerHandler, newLeaderData was changed to be 12 bytes, but only first 4 are used like before. why? 6) in Leader, it doesn't look like SHUTDOWN is fully there, we should remove it 7) there are warns and errors in Leader and QuorumPeer that looks like it needs to be a debug 8) add comment in processAck to explain why multiple outstanding proposals may be ready to commit after reconfig 9) In QuorumUtil, please put a comment with the jira number which relates to the JMX broken code great set of tests! i think we have excellent coverage of scenarios. we just need to do a simple test for 1) to make sure it works.
          Hide
          Alexander Shraer added a comment -

          Hi Ben,

          thanks a lot for reviewing the patch!
          I uploaded a new patch with a test for async reconfig as well as all your other comments addressed.

          Alex

          Show
          Alexander Shraer added a comment - Hi Ben, thanks a lot for reviewing the patch! I uploaded a new patch with a test for async reconfig as well as all your other comments addressed. Alex
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12546251/ZOOKEEPER-107-23-SEP.patch
          against trunk revision 1386496.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 37 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1187//testReport/
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1187//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12546251/ZOOKEEPER-107-23-SEP.patch against trunk revision 1386496. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 37 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1187//testReport/ Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1187//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12546253/ZOOKEEPER-107-23-SEP.patch
          against trunk revision 1386496.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 37 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1188//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1188//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1188//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12546253/ZOOKEEPER-107-23-SEP.patch against trunk revision 1386496. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 37 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1188//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1188//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1188//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12547130/ZOOKEEPER-107-29-SEP.patch
          against trunk revision 1391526.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 40 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1194//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1194//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1194//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547130/ZOOKEEPER-107-29-SEP.patch against trunk revision 1391526. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 40 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1194//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1194//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1194//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12547144/ZOOKEEPER-107-29-SEP.patch
          against trunk revision 1391526.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 40 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1195//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1195//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1195//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547144/ZOOKEEPER-107-29-SEP.patch against trunk revision 1391526. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 40 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1195//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1195//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1195//console This message is automatically generated.
          Hide
          Alexander Shraer added a comment -

          Locally all tests pass for me. The failing tests on hudson are different each time, and no line number is shown.

          Show
          Alexander Shraer added a comment - Locally all tests pass for me. The failing tests on hudson are different each time, and no line number is shown.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12547153/ZOOKEEPER-107-29-SEP.patch
          against trunk revision 1391526.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 40 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1196//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1196//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1196//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547153/ZOOKEEPER-107-29-SEP.patch against trunk revision 1391526. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 40 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1196//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1196//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1196//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12547167/ZOOKEEPER-107-29-SEP.patch
          against trunk revision 1391526.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 40 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1197//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1197//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1197//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547167/ZOOKEEPER-107-29-SEP.patch against trunk revision 1391526. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 40 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1197//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1197//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1197//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12547194/ZOOKEEPER-107-29-SEP.patch
          against trunk revision 1391526.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 40 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1200//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1200//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1200//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547194/ZOOKEEPER-107-29-SEP.patch against trunk revision 1391526. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 40 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1200//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1200//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1200//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12547399/ZOOKEEPER-107-3-Oct.patch
          against trunk revision 1391526.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 40 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1205//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1205//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1205//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547399/ZOOKEEPER-107-3-Oct.patch against trunk revision 1391526. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 40 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1205//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1205//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1205//console This message is automatically generated.
          Hide
          Jacky007 added a comment -

          The result is same in my environment。As I see,
          In testRemoveAddOne method

                      String configStr = reconfig(zk1, null, leavingServers, null, -1);
                      testServerHasConfig(zk2, null, leavingServers);
                      testNormalOperation(zk2, zk1);
          

          When i = 1 in the for loop, it is probability that testServerHasConfig hang at zk.getConfig(false, new Stat()).
          It is very strange to understand.

          Show
          Jacky007 added a comment - The result is same in my environment。As I see, In testRemoveAddOne method String configStr = reconfig(zk1, null, leavingServers, null, -1); testServerHasConfig(zk2, null, leavingServers); testNormalOperation(zk2, zk1); When i = 1 in the for loop, it is probability that testServerHasConfig hang at zk.getConfig(false, new Stat()). It is very strange to understand.
          Hide
          Jacky007 added a comment -

          When this happens, the servers are inconsistent.

          ~$echo srvr | nc 127.0.0.1 11238
          Zookeeper version: 3.5.0-1395456, built on 10/08/2012 06:12 GMT
          Latency min/avg/max: 0/0/1
          Received: 25
          Sent: 24
          Connections: 2
          Outstanding: 0
          Zxid: 0x200000000
          Mode: leader
          Node count: 6
          
          ~$echo srvr | nc 127.0.0.1 11241
          Zookeeper version: 3.5.0-1395456, built on 10/08/2012 06:12 GMT
          Latency min/avg/max: 0/0/1
          Received: 25
          Sent: 24
          Connections: 2
          Outstanding: 0
          Zxid: 0x10000000b
          Mode: follower
          Node count: 6
          
          ~$echo srvr | nc 127.0.0.1 11244
          Zookeeper version: 3.5.0-1395456, built on 10/08/2012 06:12 GMT
          Latency min/avg/max: 0/0/1
          Received: 25
          Sent: 24
          Connections: 2
          Outstanding: 0
          Zxid: 0x10000000b
          Mode: follower
          Node count: 6
          
          Show
          Jacky007 added a comment - When this happens, the servers are inconsistent. ~$echo srvr | nc 127.0.0.1 11238 Zookeeper version: 3.5.0-1395456, built on 10/08/2012 06:12 GMT Latency min/avg/max: 0/0/1 Received: 25 Sent: 24 Connections: 2 Outstanding: 0 Zxid: 0x200000000 Mode: leader Node count: 6 ~$echo srvr | nc 127.0.0.1 11241 Zookeeper version: 3.5.0-1395456, built on 10/08/2012 06:12 GMT Latency min/avg/max: 0/0/1 Received: 25 Sent: 24 Connections: 2 Outstanding: 0 Zxid: 0x10000000b Mode: follower Node count: 6 ~$echo srvr | nc 127.0.0.1 11244 Zookeeper version: 3.5.0-1395456, built on 10/08/2012 06:12 GMT Latency min/avg/max: 0/0/1 Received: 25 Sent: 24 Connections: 2 Outstanding: 0 Zxid: 0x10000000b Mode: follower Node count: 6
          Hide
          Alexander Shraer added a comment -

          Thanks a lot Jacky! I'll try to reproduce this during the weekend.

          Show
          Alexander Shraer added a comment - Thanks a lot Jacky! I'll try to reproduce this during the weekend.
          Hide
          Jacky007 added a comment -

          Hi Alexander Shraer, it is a bug of the java client.
          In the doIO method of ClientCnxnSocketNIO

           if (p != null) {
                              outgoingQueue.removeFirstOccurrence(p);
                              updateLastSend();
                              if ((p.requestHeader != null) &&
                                      (p.requestHeader.getType() != OpCode.ping) &&
                                      (p.requestHeader.getType() != OpCode.auth)) {
                                  p.requestHeader.setXid(cnxn.getXid());
                              }
                              p.createBB();
                              ByteBuffer pbb = p.bb;
                              sock.write(pbb);
                              if (!pbb.hasRemaining()) {
                                  sentCount++;
                                  if (p.requestHeader != null
                                          && p.requestHeader.getType() != OpCode.ping
                                          && p.requestHeader.getType() != OpCode.auth) {
                                      pending.add(p);
                                  }
                              }
          

          Remove packet before the sock.write() line will cause some request wait forever. We should do that after the sock.write() line.
          Should we open a new jira for this?

          Show
          Jacky007 added a comment - Hi Alexander Shraer, it is a bug of the java client. In the doIO method of ClientCnxnSocketNIO if (p != null) { outgoingQueue.removeFirstOccurrence(p); updateLastSend(); if ((p.requestHeader != null) && (p.requestHeader.getType() != OpCode.ping) && (p.requestHeader.getType() != OpCode.auth)) { p.requestHeader.setXid(cnxn.getXid()); } p.createBB(); ByteBuffer pbb = p.bb; sock.write(pbb); if (!pbb.hasRemaining()) { sentCount++; if (p.requestHeader != null && p.requestHeader.getType() != OpCode.ping && p.requestHeader.getType() != OpCode.auth) { pending.add(p); } } Remove packet before the sock.write() line will cause some request wait forever. We should do that after the sock.write() line. Should we open a new jira for this?
          Hide
          Alexander Shraer added a comment -

          Hi Jacky,

          great catch! it does seem that it would be better to first try to send the packet and only if
          it was completely sent to remove it from the outgoingQueue. I wonder if we should also throw an
          exception if pbb.hasRemaining() ?

          > Should we open a new jira for this?

          that would be great, thanks!

          Do the reconfiguration tests pass for you when this bug is solved ? I wonder why
          of all tests the reconfiguration tests were the ones hitting this bug... maybe because of
          the many leader/follower shutdowns that they do, not sure.

          Thanks,
          Alex

          Show
          Alexander Shraer added a comment - Hi Jacky, great catch! it does seem that it would be better to first try to send the packet and only if it was completely sent to remove it from the outgoingQueue. I wonder if we should also throw an exception if pbb.hasRemaining() ? > Should we open a new jira for this? that would be great, thanks! Do the reconfiguration tests pass for you when this bug is solved ? I wonder why of all tests the reconfiguration tests were the ones hitting this bug... maybe because of the many leader/follower shutdowns that they do, not sure. Thanks, Alex
          Hide
          Jacky007 added a comment -

          Do the reconfiguration tests pass for you when this bug is solved ? I wonder why
          of all tests the reconfiguration tests were the ones hitting this bug... maybe because of
          the many leader/follower shutdowns that they do, not sure.

          Yes, all tests pass here.

          If the server is restarting when we send a getReconfig(getData) request, the bug will happen.

          Show
          Jacky007 added a comment - Do the reconfiguration tests pass for you when this bug is solved ? I wonder why of all tests the reconfiguration tests were the ones hitting this bug... maybe because of the many leader/follower shutdowns that they do, not sure. Yes, all tests pass here. If the server is restarting when we send a getReconfig(getData) request, the bug will happen.
          Hide
          Jacky007 added a comment -

          opened an new jira ZOOKEEPER-1561 for it.

          Show
          Jacky007 added a comment - opened an new jira ZOOKEEPER-1561 for it.
          Hide
          Marshall McMullen added a comment -

          This version changes both the Java and C APIs to use comma-separated strings for all three arguments of joiningServers, leavingServers and newMembers.

          This approach drastically simplifies the C code in particular, and also simplifies the serialize/deserialize of the jute code as now it's just a string instead of a vector of strings.

          Note that in particular the newMembers is just a string now and not a char**. This also greatly simplifies the job of applications sitting on top of the C bindings.

          Also modified both Java and C Cli's to support this new format.

          Show
          Marshall McMullen added a comment - This version changes both the Java and C APIs to use comma-separated strings for all three arguments of joiningServers, leavingServers and newMembers. This approach drastically simplifies the C code in particular, and also simplifies the serialize/deserialize of the jute code as now it's just a string instead of a vector of strings. Note that in particular the newMembers is just a string now and not a char**. This also greatly simplifies the job of applications sitting on top of the C bindings. Also modified both Java and C Cli's to support this new format.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12549110/ZOOKEEPER-107-14-Oct.patch
          against trunk revision 1391526.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1226//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1226//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1226//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549110/ZOOKEEPER-107-14-Oct.patch against trunk revision 1391526. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1226//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1226//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1226//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12549111/ZOOKEEPER-107-15-Oct.patch
          against trunk revision 1391526.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1227//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1227//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1227//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549111/ZOOKEEPER-107-15-Oct.patch against trunk revision 1391526. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1227//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1227//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1227//console This message is automatically generated.
          Hide
          Marshall McMullen added a comment -

          fixed joinStrings.

          Show
          Marshall McMullen added a comment - fixed joinStrings.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12549155/ZOOKEEPER-107-15-Oct-ver1.patch
          against trunk revision 1391526.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1228//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1228//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1228//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549155/ZOOKEEPER-107-15-Oct-ver1.patch against trunk revision 1391526. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1228//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1228//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1228//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12549162/ZOOKEEPER-107-15-Oct-ver2.patch
          against trunk revision 1391526.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1229//testReport/
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1229//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549162/ZOOKEEPER-107-15-Oct-ver2.patch against trunk revision 1391526. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1229//testReport/ Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1229//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12549165/ZOOKEEPER-107-15-Oct-ver3.patch
          against trunk revision 1391526.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1230//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1230//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1230//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549165/ZOOKEEPER-107-15-Oct-ver3.patch against trunk revision 1391526. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1230//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1230//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1230//console This message is automatically generated.
          Hide
          Marshall McMullen added a comment -

          This latest failure looks like ZOOKEEPER-1561

          Show
          Marshall McMullen added a comment - This latest failure looks like ZOOKEEPER-1561
          Hide
          Ted Yu added a comment -

          I wonder if patch v7 from ZOOKEEPER-1560 would help prevent such test failure.
          Will combine the two patches and run these two tests locally.

          Show
          Ted Yu added a comment - I wonder if patch v7 from ZOOKEEPER-1560 would help prevent such test failure. Will combine the two patches and run these two tests locally.
          Hide
          Eugene Koontz added a comment -

          I was thinking the same thing, Ted. Thanks for investigating.

          Show
          Eugene Koontz added a comment - I was thinking the same thing, Ted. Thanks for investigating.
          Hide
          Ted Yu added a comment -

          ReconfigTest hung with combined patch.
          Test output is quite long (25MB).
          I am not familiar with ReconfigTest so am not sure what to look for in test output.

                                  LOG.warn("Couldn't write " + expectedSize + " bytes, "
          

          I verified that the above log which I added in patch v7 for ZOOKEEPER-1560 didn't appear in test output.

          Show
          Ted Yu added a comment - ReconfigTest hung with combined patch. Test output is quite long (25MB). I am not familiar with ReconfigTest so am not sure what to look for in test output. LOG.warn( "Couldn't write " + expectedSize + " bytes, " I verified that the above log which I added in patch v7 for ZOOKEEPER-1560 didn't appear in test output.
          Hide
          Alexander Shraer added a comment -

          Please ignore this patch - its another test to see if the bug is related to ZK-1561

          Show
          Alexander Shraer added a comment - Please ignore this patch - its another test to see if the bug is related to ZK-1561
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12549264/ZOOKEEPER-107-bug1.patch
          against trunk revision 1391526.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 40 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1231//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1231//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1231//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549264/ZOOKEEPER-107-bug1.patch against trunk revision 1391526. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 40 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1231//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1231//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1231//console This message is automatically generated.
          Hide
          Alexander Shraer added a comment -

          running ZOOKEEPER-107-bug1.patch verified that ZK-1561 is happening in the failing test. I'm not sure this is the reason it hangs. New patch dequeues p after the write completes.

          Show
          Alexander Shraer added a comment - running ZOOKEEPER-107 -bug1.patch verified that ZK-1561 is happening in the failing test. I'm not sure this is the reason it hangs. New patch dequeues p after the write completes.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12549273/ZOOKEEPER-107-bug2.patch
          against trunk revision 1391526.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 40 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1232//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1232//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1232//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12549273/ZOOKEEPER-107-bug2.patch against trunk revision 1391526. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 40 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1232//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1232//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1232//console This message is automatically generated.
          Hide
          Alexander Shraer added a comment -

          trying again, now that 1560 is fixed

          Show
          Alexander Shraer added a comment - trying again, now that 1560 is fixed
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12552426/ZOOKEEPER-107-6-NOV-2.patch
          against trunk revision 1404288.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 40 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1254//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1254//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1254//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552426/ZOOKEEPER-107-6-NOV-2.patch against trunk revision 1404288. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 40 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1254//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1254//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1254//console This message is automatically generated.
          Hide
          Alexander Shraer added a comment -

          this time including Marshall's latest changes

          Show
          Alexander Shraer added a comment - this time including Marshall's latest changes
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12552495/ZOOKEEPER-107-7-NOV.patch
          against trunk revision 1404288.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1255//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1255//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1255//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552495/ZOOKEEPER-107-7-NOV.patch against trunk revision 1404288. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1255//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1255//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1255//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12552523/ZOOKEEPER-107-7-NOV-ver1.patch
          against trunk revision 1404288.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1256//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1256//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1256//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552523/ZOOKEEPER-107-7-NOV-ver1.patch against trunk revision 1404288. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1256//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1256//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1256//console This message is automatically generated.
          Hide
          Alexander Shraer added a comment -

          small fix in QuorumCnxManager

          Show
          Alexander Shraer added a comment - small fix in QuorumCnxManager
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12552608/ZOOKEEPER-107-7-NOV-ver2.patch
          against trunk revision 1404288.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1257//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1257//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1257//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12552608/ZOOKEEPER-107-7-NOV-ver2.patch against trunk revision 1404288. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1257//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1257//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1257//console This message is automatically generated.
          Hide
          Alexander Shraer added a comment -

          Interesting, all tests actually passed, and yet it says -1 core tests!

          Show
          Alexander Shraer added a comment - Interesting, all tests actually passed, and yet it says -1 core tests!
          Hide
          Alexander Shraer added a comment -

          Removed tabs and some trailing spaces

          Show
          Alexander Shraer added a comment - Removed tabs and some trailing spaces
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12555318/ZOOKEEPER-107-28-NOV.patch
          against trunk revision 1414566.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 15 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1276//testReport/
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1276//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555318/ZOOKEEPER-107-28-NOV.patch against trunk revision 1414566. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1276//testReport/ Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1276//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12555320/ZOOKEEPER-107-28-NOV-ver1.patch
          against trunk revision 1414566.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1277//testReport/
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1277//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555320/ZOOKEEPER-107-28-NOV-ver1.patch against trunk revision 1414566. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1277//testReport/ Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1277//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12555321/ZOOKEEPER-107-28-NOV-ver2.patch
          against trunk revision 1414566.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1278//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1278//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1278//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555321/ZOOKEEPER-107-28-NOV-ver2.patch against trunk revision 1414566. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1278//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1278//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1278//console This message is automatically generated.
          Hide
          Alexander Shraer added a comment -

          Again, all tests passed but the report says "-1 core tests". Anyone knows why this is happening ?

          Show
          Alexander Shraer added a comment - Again, all tests passed but the report says "-1 core tests". Anyone knows why this is happening ?
          Hide
          Marshall McMullen added a comment -

          The "core tests" includes both the Java tests and the C tests. If you look only at the java section, you'll think all the tests passed. In this case, if you click on the full output, you'll see:

          [exec] [exec] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src/c/tests/TestReconfig.cc:571: Assertion: assertion failed [Expression: numClientsPerHost.at(i) >= lowerboundClientsPerServer(numClients, numServers)]
          [exec] [exec] Failures !!!
          [exec] [exec] Run: 38 Failure total: 1 Failures: 1 Errors: 0
          [exec] [exec] make: *** [run-check] Error 1
          [exec]
          [exec] BUILD FAILED
          [exec] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1262: The following error occurred while executing this line:
          [exec] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1272: exec returned: 2

          Show
          Marshall McMullen added a comment - The "core tests" includes both the Java tests and the C tests. If you look only at the java section, you'll think all the tests passed. In this case, if you click on the full output, you'll see: [exec] [exec] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src/c/tests/TestReconfig.cc:571: Assertion: assertion failed [Expression: numClientsPerHost.at(i) >= lowerboundClientsPerServer(numClients, numServers)] [exec] [exec] Failures !!! [exec] [exec] Run: 38 Failure total: 1 Failures: 1 Errors: 0 [exec] [exec] make: *** [run-check] Error 1 [exec] [exec] BUILD FAILED [exec] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1262: The following error occurred while executing this line: [exec] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1272: exec returned: 2
          Hide
          Alexander Shraer added a comment -

          Ah, you're right.

          Looking at this line (committed as part of our 1355), this is testing
          initial distribution of clients across servers, even before any
          rebalancing algorithm is invoked. Perhaps some random number generator
          problems ?

          On Wed, Nov 28, 2012 at 11:17 PM, Marshall McMullen (JIRA)

          Show
          Alexander Shraer added a comment - Ah, you're right. Looking at this line (committed as part of our 1355), this is testing initial distribution of clients across servers, even before any rebalancing algorithm is invoked. Perhaps some random number generator problems ? On Wed, Nov 28, 2012 at 11:17 PM, Marshall McMullen (JIRA)
          Hide
          Marshall McMullen added a comment -

          I've created a separate jira to address this test failure since it has nothing to do with ZOOKEEPER-107 directly but is part of the already committed ZOOKEEPER-1355. The new Jira is https://issues.apache.org/jira/browse/ZOOKEEPER-1594, which I assigned to myself.

          Show
          Marshall McMullen added a comment - I've created a separate jira to address this test failure since it has nothing to do with ZOOKEEPER-107 directly but is part of the already committed ZOOKEEPER-1355 . The new Jira is https://issues.apache.org/jira/browse/ZOOKEEPER-1594 , which I assigned to myself.
          Hide
          Alexander Shraer added a comment -

          patch for trunk. two test cases are not working - these are related to client port reconfiguration and are broken since the code needs to be adjusted following changes in zk-1504

          Show
          Alexander Shraer added a comment - patch for trunk. two test cases are not working - these are related to client port reconfiguration and are broken since the code needs to be adjusted following changes in zk-1504
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12564847/ZOOKEEPER-107-14-Jan.patch
          against trunk revision 1427034.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1337//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1337//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1337//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12564847/ZOOKEEPER-107-14-Jan.patch against trunk revision 1427034. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1337//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1337//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1337//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12565247/ZOOKEEPER-107-16-Jan.patch
          against trunk revision 1433651.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1338//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565247/ZOOKEEPER-107-16-Jan.patch against trunk revision 1433651. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1338//console This message is automatically generated.
          Hide
          Thawan Kooburat added a comment -

          I was looking at the patch direct so I might miss understood some of the detail. I think you still need to close the selector when destroying the old acceptor thread (and possibly joining with the thread as well). reconfiguring flag is part of NIOConnectionFactory? then you will have to set it back to false when reconfiguration is done as well.

          Show
          Thawan Kooburat added a comment - I was looking at the patch direct so I might miss understood some of the detail. I think you still need to close the selector when destroying the old acceptor thread (and possibly joining with the thread as well). reconfiguring flag is part of NIOConnectionFactory? then you will have to set it back to false when reconfiguration is done as well.
          Hide
          Alexander Shraer added a comment -

          > I think you still need to close the selector when destroying the old acceptor thread (and possibly joining with the thread as well)

          yes, agreed about both.

          > reconfiguring flag is part of NIOConnectionFactory? then you will have to set it back to false when reconfiguration is done as well.

          yes, I do that
          if (!reconfiguring)

          { closeSelector(); NIOServerCnxnFactory.this.stop(); }

          else reconfiguring = false;
          but as you said I'll have to move the closeSelector out, since a new one is going to be opened by the new acceptorThread

          Show
          Alexander Shraer added a comment - > I think you still need to close the selector when destroying the old acceptor thread (and possibly joining with the thread as well) yes, agreed about both. > reconfiguring flag is part of NIOConnectionFactory? then you will have to set it back to false when reconfiguration is done as well. yes, I do that if (!reconfiguring) { closeSelector(); NIOServerCnxnFactory.this.stop(); } else reconfiguring = false; but as you said I'll have to move the closeSelector out, since a new one is going to be opened by the new acceptorThread
          Hide
          Thawan Kooburat added a comment -

          You can make reconfiguring flag as part of the AcceptThread, since it is not used elsewhere and you don't have to worry about setting the value back or raise condition.

          Show
          Thawan Kooburat added a comment - You can make reconfiguring flag as part of the AcceptThread, since it is not used elsewhere and you don't have to worry about setting the value back or raise condition.
          Hide
          Alexander Shraer added a comment -

          sounds good. thanks!

          Show
          Alexander Shraer added a comment - sounds good. thanks!
          Hide
          Alexander Shraer added a comment -

          Patch includes Thawan's suggestion and C client changes that were missing from previous couple of patches by mistake.

          Show
          Alexander Shraer added a comment - Patch includes Thawan's suggestion and C client changes that were missing from previous couple of patches by mistake.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12565414/ZOOKEEPER-107-17-Jan.patch
          against trunk revision 1434978.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1342//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1342//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1342//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565414/ZOOKEEPER-107-17-Jan.patch against trunk revision 1434978. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1342//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1342//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1342//console This message is automatically generated.
          Show
          Alexander Shraer added a comment - Thawan, any ideas why it still fails on "too many open files" here ? https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1342//testReport/org.apache.zookeeper.test/ReconfigTest/testQuorumSystemChange/
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12565494/ZOOKEEPER-107-18-Jan.patch
          against trunk revision 1434978.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 35 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1344//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1344//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1344//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565494/ZOOKEEPER-107-18-Jan.patch against trunk revision 1434978. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 35 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1344//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1344//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1344//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12565541/ZOOKEEPER-107-18-Jan-ver2.patch
          against trunk revision 1434978.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 35 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1346//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1346//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1346//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565541/ZOOKEEPER-107-18-Jan-ver2.patch against trunk revision 1434978. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 35 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1346//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1346//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1346//console This message is automatically generated.
          Hide
          Patrick Hunt added a comment -

          Have you tested backward compatibility at all? e.g. test running a server in an ensemble with 3.4 servers. Also rolling upgrade.

          Show
          Patrick Hunt added a comment - Have you tested backward compatibility at all? e.g. test running a server in an ensemble with 3.4 servers. Also rolling upgrade.
          Hide
          Alexander Shraer added a comment -

          I haven't, I'd appreciate some help with that...

          I also have a question about the flaky test above (passed in the first patch I submitted today, failed in the second).
          Sometimes (also on my local machine) in testTrancationLogCorruption zk1 goes from Connecting to CLOSED. Any idea on what could cause this ?

          Show
          Alexander Shraer added a comment - I haven't, I'd appreciate some help with that... I also have a question about the flaky test above (passed in the first patch I submitted today, failed in the second). Sometimes (also on my local machine) in testTrancationLogCorruption zk1 goes from Connecting to CLOSED. Any idea on what could cause this ?
          Hide
          Alexander Shraer added a comment -

          It seems that testTransactionLogCorruption is very flaky with ZK-107, but also without it, for example here:

          https://builds.apache.org/job/ZooKeeper-trunk-jdk7/502/

          Show
          Alexander Shraer added a comment - It seems that testTransactionLogCorruption is very flaky with ZK-107, but also without it, for example here: https://builds.apache.org/job/ZooKeeper-trunk-jdk7/502/
          Hide
          Alexander Shraer added a comment -

          A few fixes related to backward compatibility.

          Show
          Alexander Shraer added a comment - A few fixes related to backward compatibility.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12565731/ZOOKEEPER-107-20-Jan.patch
          against trunk revision 1434978.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 32 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1347//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1347//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1347//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565731/ZOOKEEPER-107-20-Jan.patch against trunk revision 1434978. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 32 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1347//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1347//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1347//console This message is automatically generated.
          Hide
          Michi Mutsuzaki added a comment -

          Alex and I worked on writing C tests for this. We have working skeleton. I'll clean up the patch, and post it here some time this week.

          --Michi

          Show
          Michi Mutsuzaki added a comment - Alex and I worked on writing C tests for this. We have working skeleton. I'll clean up the patch, and post it here some time this week. --Michi
          Hide
          Marshall McMullen added a comment -

          Awesome job Alex and Michi. Sorry I couldn't have helped with this one but way too much on my plate right now. Will definitely review the patch and offer suggestions where I can.

          Show
          Marshall McMullen added a comment - Awesome job Alex and Michi. Sorry I couldn't have helped with this one but way too much on my plate right now. Will definitely review the patch and offer suggestions where I can.
          Hide
          Alexander Shraer added a comment -

          Edward's comments addressed.

          Show
          Alexander Shraer added a comment - Edward's comments addressed.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12566478/ZOOKEEPER-107-24-Jan.patch
          against trunk revision 1438375.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 28 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 26 release audit warnings (more than the trunk's current 24 warnings).

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1364//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1364//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1364//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1364//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12566478/ZOOKEEPER-107-24-Jan.patch against trunk revision 1438375. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 28 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 26 release audit warnings (more than the trunk's current 24 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1364//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1364//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1364//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1364//console This message is automatically generated.
          Hide
          Marshall McMullen added a comment -

          We just wanted to give some support for this patch.

          It has been extremely helpful to our Zookeeper deployment strategy. We (www.solidfire.com) use Zookeeper as our database for a distributed scale-out block storage system. We deploy fully automated clusters from 3 to 25+ nodes that must be able to self heal ensemble issues after inevitable node failures. We maintain an ensemble of 3 or 5 nodes, so if one of these ensemble nodes fails we require another node in the cluster to assume its duties as quickly as possible. Prior to this patch, the only way to reconfigure the ensemble was to do a typical "rolling restart" of the Zookeeper process on each of the nodes after modifying the local configuration file, but manually adjusting the ensemble wasn't an option. Automating this was a challenge though since Zookeeper also contained our authoritative node list (chicken and egg), which meant that cascading failures could easily lead to inconsistencies in configuration files and then the ensemble could get easily confused. Additionally, the re-connection events caused by restarting servers complicated the centralized logic that was coordinating the reconfiguration.

          This is where Zookeeper-107 has been a life saver. We integrated this patch into our production version of Zookeeper-3.5.0 about a year ago and have been running with it since. The simplicity of the interface and the reliability it has shown have made handling ensemble node failures significantly easier. We exclusively use the c-client and have not had any problems with it. Currently we only use the "full reconfiguration" functionality rather than the "incremental" reconfiguration. In integrating this functionality with our coordination code, all of the problems we ran into were ours and the patch showed solid resilience to the many error scenarios we've thrown at it. Furthermore, Alex was super helpful and responsive in helping us integrate this into our environment. Eventually we want to write up a small white paper on exactly how we manage our ensemble, but we don't have the time right now.

          Just to give some perspective on the amount of stress this patch has gotten in our codebase, here are some stats. The reconfiguration code has ran 10,000+ times within our integration testing and product qualification environments. The sorts of tests we have in our integration testing environment are heavily focused on reconfiguring the ensemble while there are concurrent operations in flight simultaneously with the reconfiguration operation. We've never seen an operation fail or get missed in any way during the reconfiguration. We also have an extensive set of tests which essentially create a three node ensemble and then add two more nodes and grow the ensemble to 5. The test has two different Zookeeper clients – one connected to the original set of 3 and one connected to the new set of 2 that is joining the ensemble. While we are reconfiguring the ensemble to grow from 3 to 5 we hammer the original 3 nodes with write operations. Once the reconfiguration is complete, we validate that the new 2 nodes see all the operations submitted to the original 3 nodes. We've never seen any problems with any of our tests!

          Given its stability in house, we are now actively deploying Zookeeper-3.5.0 with this patch to several customers running production environments (and many more running test environments), which combined with our internal test environment means we have this software running on 200+ nodes in 25-50+ clusters at any one time. Our website shows the importance these customers place on our software, so you can be sure we've tested it extensively in hundreds of different reconfiguration scenarios, and we haven't had to file a single jira yet.

          Show
          Marshall McMullen added a comment - We just wanted to give some support for this patch. It has been extremely helpful to our Zookeeper deployment strategy. We (www.solidfire.com) use Zookeeper as our database for a distributed scale-out block storage system. We deploy fully automated clusters from 3 to 25+ nodes that must be able to self heal ensemble issues after inevitable node failures. We maintain an ensemble of 3 or 5 nodes, so if one of these ensemble nodes fails we require another node in the cluster to assume its duties as quickly as possible. Prior to this patch, the only way to reconfigure the ensemble was to do a typical "rolling restart" of the Zookeeper process on each of the nodes after modifying the local configuration file, but manually adjusting the ensemble wasn't an option. Automating this was a challenge though since Zookeeper also contained our authoritative node list (chicken and egg), which meant that cascading failures could easily lead to inconsistencies in configuration files and then the ensemble could get easily confused. Additionally, the re-connection events caused by restarting servers complicated the centralized logic that was coordinating the reconfiguration. This is where Zookeeper-107 has been a life saver. We integrated this patch into our production version of Zookeeper-3.5.0 about a year ago and have been running with it since. The simplicity of the interface and the reliability it has shown have made handling ensemble node failures significantly easier. We exclusively use the c-client and have not had any problems with it. Currently we only use the "full reconfiguration" functionality rather than the "incremental" reconfiguration. In integrating this functionality with our coordination code, all of the problems we ran into were ours and the patch showed solid resilience to the many error scenarios we've thrown at it. Furthermore, Alex was super helpful and responsive in helping us integrate this into our environment. Eventually we want to write up a small white paper on exactly how we manage our ensemble, but we don't have the time right now. Just to give some perspective on the amount of stress this patch has gotten in our codebase, here are some stats. The reconfiguration code has ran 10,000+ times within our integration testing and product qualification environments. The sorts of tests we have in our integration testing environment are heavily focused on reconfiguring the ensemble while there are concurrent operations in flight simultaneously with the reconfiguration operation. We've never seen an operation fail or get missed in any way during the reconfiguration. We also have an extensive set of tests which essentially create a three node ensemble and then add two more nodes and grow the ensemble to 5. The test has two different Zookeeper clients – one connected to the original set of 3 and one connected to the new set of 2 that is joining the ensemble. While we are reconfiguring the ensemble to grow from 3 to 5 we hammer the original 3 nodes with write operations. Once the reconfiguration is complete, we validate that the new 2 nodes see all the operations submitted to the original 3 nodes. We've never seen any problems with any of our tests! Given its stability in house, we are now actively deploying Zookeeper-3.5.0 with this patch to several customers running production environments (and many more running test environments), which combined with our internal test environment means we have this software running on 200+ nodes in 25-50+ clusters at any one time. Our website shows the importance these customers place on our software, so you can be sure we've tested it extensively in hundreds of different reconfiguration scenarios, and we haven't had to file a single jira yet.
          Hide
          Flavio Junqueira added a comment -

          Thanks for sharing your experience, Marshall. I find it useful to hear that the patch was tested in a real production setting. I also want to see this in very much, not only because it is a very useful feature, but also because it is blocking other patches.

          Is anyone looking at the -1 for core tests and release audit from the last jenkins run?

          Show
          Flavio Junqueira added a comment - Thanks for sharing your experience, Marshall. I find it useful to hear that the patch was tested in a real production setting. I also want to see this in very much, not only because it is a very useful feature, but also because it is blocking other patches. Is anyone looking at the -1 for core tests and release audit from the last jenkins run?
          Hide
          Alexander Shraer added a comment -

          The core test failure is due to C compilation errors. Michi fixed these and is adding C tests, which he'll probably upload this week.

          Show
          Alexander Shraer added a comment - The core test failure is due to C compilation errors. Michi fixed these and is adding C tests, which he'll probably upload this week.
          Hide
          Michi Mutsuzaki added a comment -

          By the way, you don't need to rename the patch when you upload a new one. You can use the same name (e.g ZOOKEEPER-107.patch), and jira automatically highlight the latest patch. It's getting pretty hard to figure out which one the latest patch is

          --Michi

          Show
          Michi Mutsuzaki added a comment - By the way, you don't need to rename the patch when you upload a new one. You can use the same name (e.g ZOOKEEPER-107 .patch), and jira automatically highlight the latest patch. It's getting pretty hard to figure out which one the latest patch is --Michi
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12567810/ZOOKEEPER-107-3-Feb.patch
          against trunk revision 1441922.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 32 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause tar ant target to fail.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1378//testReport/
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1378//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567810/ZOOKEEPER-107-3-Feb.patch against trunk revision 1441922. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 32 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1378//testReport/ Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1378//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12567812/ZOOKEEPER-107-3-Feb-ver1.patch
          against trunk revision 1441922.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 32 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 27 release audit warnings (more than the trunk's current 26 warnings).

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1379//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1379//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1379//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1379//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567812/ZOOKEEPER-107-3-Feb-ver1.patch against trunk revision 1441922. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 32 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 27 release audit warnings (more than the trunk's current 26 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1379//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1379//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1379//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1379//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12567820/ZOOKEEPER-107-3-Feb-ver2.patch
          against trunk revision 1441922.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 32 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1380//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1380//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1380//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567820/ZOOKEEPER-107-3-Feb-ver2.patch against trunk revision 1441922. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 32 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1380//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1380//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1380//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12567912/ZOOKEEPER-107-4-Feb.patch
          against trunk revision 1441922.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 32 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1382//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1382//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1382//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567912/ZOOKEEPER-107-4-Feb.patch against trunk revision 1441922. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 32 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1382//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1382//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1382//console This message is automatically generated.
          Hide
          Patrick Hunt added a comment -

          The latest patch introduces a (new) dependency on python to the build system. We should not do this. I strongly recommend we convert it to a bash script, or drive it from C/C++ similar to all of our other c client tests.

          Show
          Patrick Hunt added a comment - The latest patch introduces a (new) dependency on python to the build system. We should not do this. I strongly recommend we convert it to a bash script, or drive it from C/C++ similar to all of our other c client tests.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12567917/ZOOKEEPER-107-4-Feb-ver1.patch
          against trunk revision 1441922.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 32 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1383//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1383//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1383//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567917/ZOOKEEPER-107-4-Feb-ver1.patch against trunk revision 1441922. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 32 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1383//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1383//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1383//console This message is automatically generated.
          Hide
          Marshall McMullen added a comment -

          I agree with Patrick that we should not introduce dependency on python for the unit tests. There are lots of automated build/test regression suites that build zookeeper and don't necessary have or want a python dependency (mine included).

          Is there a reason you guys deviated from using the C++ test-drive infrastructure?

          Show
          Marshall McMullen added a comment - I agree with Patrick that we should not introduce dependency on python for the unit tests. There are lots of automated build/test regression suites that build zookeeper and don't necessary have or want a python dependency (mine included). Is there a reason you guys deviated from using the C++ test-drive infrastructure?
          Hide
          Alexander Shraer added a comment -

          It was just much much easier to write a script that uses the C CLI than to write (or even understand) ZooKeeper's C++ tests. After looking at the tests, we were not even sure that there are currently tests that are sufficiently close to what we need to do.
          In addition, this way we also test the C CLI. Michi is working on converting the script to bash.

          Show
          Alexander Shraer added a comment - It was just much much easier to write a script that uses the C CLI than to write (or even understand) ZooKeeper's C++ tests. After looking at the tests, we were not even sure that there are currently tests that are sufficiently close to what we need to do. In addition, this way we also test the C CLI. Michi is working on converting the script to bash.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12567949/ZOOKEEPER-107-4-Feb-ver2.patch
          against trunk revision 1441922.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 27 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1384//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1384//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1384//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567949/ZOOKEEPER-107-4-Feb-ver2.patch against trunk revision 1441922. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 27 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1384//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1384//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1384//console This message is automatically generated.
          Hide
          Benjamin Reed added a comment -

          great work you guys! can you comment on why FdLeakTest was removed? also, i think we should have added LearnerInfoVX, where X is the protocol number rather than changing LearnerInfo. (we did bump the protocol. right?)

          Show
          Benjamin Reed added a comment - great work you guys! can you comment on why FdLeakTest was removed? also, i think we should have added LearnerInfoVX, where X is the protocol number rather than changing LearnerInfo. (we did bump the protocol. right?)
          Hide
          Alexander Shraer added a comment -

          thanks Ben! the test was removed by mistake - great catch! I'll fix it in the next patch I upload.

          we did not bump the protocol version - I suggest that if this is needed we do that as a separate patch as this may affect other areas in the code besides the ones I touched (and may break other things unrelated to reconfiguration .

          LearnerInfo was already changed once without bumping the protocol, so we will still have the "if" statement there as we do now.
          But if we bump the version we could do what you suggest and get rid of the last part of the "if" that I've added.

          Alex

          Show
          Alexander Shraer added a comment - thanks Ben! the test was removed by mistake - great catch! I'll fix it in the next patch I upload. we did not bump the protocol version - I suggest that if this is needed we do that as a separate patch as this may affect other areas in the code besides the ones I touched (and may break other things unrelated to reconfiguration . LearnerInfo was already changed once without bumping the protocol, so we will still have the "if" statement there as we do now. But if we bump the version we could do what you suggest and get rid of the last part of the "if" that I've added. Alex
          Hide
          Alexander Shraer added a comment -

          Michi completed the C client tests, they are included in the attached patch.

          Show
          Alexander Shraer added a comment - Michi completed the C client tests, they are included in the attached patch.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12571782/ZOOKEEPER-107-2-Mar.patch
          against trunk revision 1448007.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 46 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1418//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1418//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1418//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571782/ZOOKEEPER-107-2-Mar.patch against trunk revision 1448007. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 46 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1418//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1418//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1418//console This message is automatically generated.
          Hide
          Alexander Shraer added a comment -

          I think that this patch is ready to be committed.

          With this patch, upgrade to 3.5.0 will have to be through 3.4.6 (see ZOOKEEPER-1633).

          Show
          Alexander Shraer added a comment - I think that this patch is ready to be committed. With this patch, upgrade to 3.5.0 will have to be through 3.4.6 (see ZOOKEEPER-1633 ).
          Hide
          Alexander Shraer added a comment -

          Small fix to reconfig java CLI command

          Show
          Alexander Shraer added a comment - Small fix to reconfig java CLI command
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12571991/ZOOKEEPER-107-5-Mar.patch
          against trunk revision 1448007.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 49 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1422//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1422//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1422//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571991/ZOOKEEPER-107-5-Mar.patch against trunk revision 1448007. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 49 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1422//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1422//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1422//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12572003/ZOOKEEPER-107-5-Mar-ver2.patch
          against trunk revision 1448007.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 50 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1423//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1423//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1423//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12572003/ZOOKEEPER-107-5-Mar-ver2.patch against trunk revision 1448007. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 50 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1423//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1423//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1423//console This message is automatically generated.
          Hide
          Benjamin Reed added a comment -

          +1 awesome work you guys! looks ready to me. any objections?

          Show
          Benjamin Reed added a comment - +1 awesome work you guys! looks ready to me. any objections?
          Hide
          Eric Yang added a comment -

          +1 non-binding. Thank you for the great work.

          Show
          Eric Yang added a comment - +1 non-binding. Thank you for the great work.
          Hide
          Benjamin Reed added a comment -

          Committed revision 1453693.
          great work everyone!

          Show
          Benjamin Reed added a comment - Committed revision 1453693. great work everyone!
          Hide
          Hudson added a comment -

          Integrated in ZooKeeper-trunk #1855 (See https://builds.apache.org/job/ZooKeeper-trunk/1855/)
          ZOOKEEPER-107. Allow dynamic changes to server cluster membership (Alex Shraer via breed) (Revision 1453693)

          Result = FAILURE
          breed : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1453693
          Files :

          • /zookeeper/trunk/.gitignore
          • /zookeeper/trunk/CHANGES.txt
          • /zookeeper/trunk/build.xml
          • /zookeeper/trunk/src/c/Makefile.am
          • /zookeeper/trunk/src/c/README
          • /zookeeper/trunk/src/c/include/proto.h
          • /zookeeper/trunk/src/c/include/zookeeper.h
          • /zookeeper/trunk/src/c/src/cli.c
          • /zookeeper/trunk/src/c/src/zookeeper.c
          • /zookeeper/trunk/src/c/tests/TestReconfigServer.cc
          • /zookeeper/trunk/src/c/tests/ZooKeeperQuorumServer.cc
          • /zookeeper/trunk/src/c/tests/ZooKeeperQuorumServer.h
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/KeeperException.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/ZooDefs.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/ZooKeeper.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/ZooKeeperMain.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/cli/GetConfigCommand.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/cli/ReconfigCommand.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/common/StringUtils.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/DataTree.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NIOServerCnxnFactory.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NettyServerCnxnFactory.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/Request.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ServerCnxnFactory.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/TraceFormatter.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZKDatabase.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/CommitProcessor.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Follower.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/FollowerRequestProcessor.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Learner.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Observer.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/ObserverRequestProcessor.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumBean.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerMain.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumZooKeeperServer.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/ReadOnlyRequestProcessor.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumHierarchical.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumMaj.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumVerifier.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/util/ConfigUtils.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/util/SerializeUtils.java
          • /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerTestBase.java
          • /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/ReconfigRecoveryTest.java
          • /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java
          • /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java
          • /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/QuorumMajorityTest.java
          • /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/QuorumUtil.java
          • /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/ReconfigTest.java
          • /zookeeper/trunk/src/zookeeper.jute
          Show
          Hudson added a comment - Integrated in ZooKeeper-trunk #1855 (See https://builds.apache.org/job/ZooKeeper-trunk/1855/ ) ZOOKEEPER-107 . Allow dynamic changes to server cluster membership (Alex Shraer via breed) (Revision 1453693) Result = FAILURE breed : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1453693 Files : /zookeeper/trunk/.gitignore /zookeeper/trunk/CHANGES.txt /zookeeper/trunk/build.xml /zookeeper/trunk/src/c/Makefile.am /zookeeper/trunk/src/c/README /zookeeper/trunk/src/c/include/proto.h /zookeeper/trunk/src/c/include/zookeeper.h /zookeeper/trunk/src/c/src/cli.c /zookeeper/trunk/src/c/src/zookeeper.c /zookeeper/trunk/src/c/tests/TestReconfigServer.cc /zookeeper/trunk/src/c/tests/ZooKeeperQuorumServer.cc /zookeeper/trunk/src/c/tests/ZooKeeperQuorumServer.h /zookeeper/trunk/src/java/main/org/apache/zookeeper/KeeperException.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/ZooDefs.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/ZooKeeper.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/ZooKeeperMain.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/cli/GetConfigCommand.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/cli/ReconfigCommand.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/common/StringUtils.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/DataTree.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NIOServerCnxnFactory.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/NettyServerCnxnFactory.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/Request.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ServerCnxnFactory.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/TraceFormatter.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZKDatabase.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/CommitProcessor.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Follower.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/FollowerRequestProcessor.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Learner.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Observer.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/ObserverRequestProcessor.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumBean.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerConfig.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerMain.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumZooKeeperServer.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/ReadOnlyRequestProcessor.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumHierarchical.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumMaj.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumVerifier.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/util/ConfigUtils.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/util/SerializeUtils.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerTestBase.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/ReconfigRecoveryTest.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/QuorumMajorityTest.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/QuorumUtil.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/ReconfigTest.java /zookeeper/trunk/src/zookeeper.jute
          Hide
          Camille Fournier added a comment -

          This hasn't yet passed Jenkins and someone needs to look at why the build is failing with this change. I notice that precommit builds are also failing since we put this change in and I'm concerned this has caused larger problems. Can someone take a look?

          Show
          Camille Fournier added a comment - This hasn't yet passed Jenkins and someone needs to look at why the build is failing with this change. I notice that precommit builds are also failing since we put this change in and I'm concerned this has caused larger problems. Can someone take a look?
          Hide
          Alexander Shraer added a comment -

          The Jenkins failures so far were caused by two bugs: first one addressed by ZOOKEEPER-1662 (needs to be committed), second one is in C tests and Michi is looking into it (he wrote the C tests).

          Show
          Alexander Shraer added a comment - The Jenkins failures so far were caused by two bugs: first one addressed by ZOOKEEPER-1662 (needs to be committed), second one is in C tests and Michi is looking into it (he wrote the C tests).
          Hide
          Camille Fournier added a comment -

          OK. I'm not a fan of closing tickets until they get through at least one successful build once they've been checked in, which we haven't seen yet for this ticket. Definitely ok with considering disabling the truncate test until it gets fixed, but I wish we weren't claiming this work done when it's left us in a broken state as far as the build is concerned. Let's see if we can't at least get truncate test disabled and 1662 committed, to see if that will fix the build.

          Show
          Camille Fournier added a comment - OK. I'm not a fan of closing tickets until they get through at least one successful build once they've been checked in, which we haven't seen yet for this ticket. Definitely ok with considering disabling the truncate test until it gets fixed, but I wish we weren't claiming this work done when it's left us in a broken state as far as the build is concerned. Let's see if we can't at least get truncate test disabled and 1662 committed, to see if that will fix the build.
          Hide
          Alexander Shraer added a comment -

          Camille, I'm not claiming it done, I definitely want to see the problems resolved. I've also opened two related jiras (updating documentation, which I'm doing, and JMX support which I hope someone else that is familiar with JMX can take).

          1662 is one problem but the build will still fail until the C tests are fixed. Hopefully we'll figure out that bug soon.

          Thanks,
          Alex

          Show
          Alexander Shraer added a comment - Camille, I'm not claiming it done, I definitely want to see the problems resolved. I've also opened two related jiras (updating documentation, which I'm doing, and JMX support which I hope someone else that is familiar with JMX can take). 1662 is one problem but the build will still fail until the C tests are fixed. Hopefully we'll figure out that bug soon. Thanks, Alex
          Hide
          Camille Fournier added a comment -

          Didn't mean to imply you were Alex! I will try to see what I can do with the truncate test stuff tomorrow so we can get 1662 in. We'll get this finished up, it's a great feature.

          Show
          Camille Fournier added a comment - Didn't mean to imply you were Alex! I will try to see what I can do with the truncate test stuff tomorrow so we can get 1662 in. We'll get this finished up, it's a great feature.
          Hide
          Michi Mutsuzaki added a comment -

          Modified the c++ reconfig test to wait for the cluster to start by polling the server status every second for 10 seconds.

          https://issues.apache.org/jira/secure/attachment/12573026/ZOOKEEPER-107.patch

          Show
          Michi Mutsuzaki added a comment - Modified the c++ reconfig test to wait for the cluster to start by polling the server status every second for 10 seconds. https://issues.apache.org/jira/secure/attachment/12573026/ZOOKEEPER-107.patch
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12573026/ZOOKEEPER-107.patch
          against trunk revision 1453693.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1428//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1428//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1428//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573026/ZOOKEEPER-107.patch against trunk revision 1453693. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1428//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1428//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1428//console This message is automatically generated.
          Hide
          Camille Fournier added a comment -

          Looks good Michi, I am checking this fix in.

          Show
          Camille Fournier added a comment - Looks good Michi, I am checking this fix in.
          Hide
          Hudson added a comment -

          Integrated in ZooKeeper-trunk #1860 (See https://builds.apache.org/job/ZooKeeper-trunk/1860/)
          ZOOKEEPER-107. Allow dynamic changes to server cluster membership, small bugfix (michi via camille) (Revision 1455387)

          Result = FAILURE
          camille : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1455387
          Files :

          • /zookeeper/trunk/src/c/tests/TestReconfigServer.cc
          • /zookeeper/trunk/src/c/tests/ZooKeeperQuorumServer.cc
          Show
          Hudson added a comment - Integrated in ZooKeeper-trunk #1860 (See https://builds.apache.org/job/ZooKeeper-trunk/1860/ ) ZOOKEEPER-107 . Allow dynamic changes to server cluster membership, small bugfix (michi via camille) (Revision 1455387) Result = FAILURE camille : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1455387 Files : /zookeeper/trunk/src/c/tests/TestReconfigServer.cc /zookeeper/trunk/src/c/tests/ZooKeeperQuorumServer.cc
          Hide
          Camille Fournier added a comment -

          Michi looks like this still isn't reliably fixing the ZooKeeperQuorumServer test. Any other ideas?

          Show
          Camille Fournier added a comment - Michi looks like this still isn't reliably fixing the ZooKeeperQuorumServer test. Any other ideas?
          Hide
          Michi Mutsuzaki added a comment -

          One possibility is that something is already running on port 4000, and zookeeper server is not able to bind to the port. Would it be possible for me to get access to the buildbot? It would be much easier to debug. If not, I can try changing the port to something else.

          --Michi

          Show
          Michi Mutsuzaki added a comment - One possibility is that something is already running on port 4000, and zookeeper server is not able to bind to the port. Would it be possible for me to get access to the buildbot? It would be much easier to debug. If not, I can try changing the port to something else. --Michi
          Hide
          Benjamin Reed added a comment -

          it passed jenkins right before the commit. how did that happen? is jenkins not running properly or do we have a race condition?

          <