Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.2.0
    • Component/s: server
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Flexible quorums in zookeeper.
    1. ZOOKEEPER-29.patch
      45 kB
      Flavio Junqueira
    2. ZOOKEEPER-29.patch
      51 kB
      Flavio Junqueira
    3. ZOOKEEPER-29.patch
      49 kB
      Flavio Junqueira
    4. ZOOKEEPER-29.patch
      50 kB
      Flavio Junqueira
    5. ZOOKEEPER-29.patch
      50 kB
      Flavio Junqueira
    6. ZOOKEEPER-29.patch
      50 kB
      Flavio Junqueira
    7. ZOOKEEPER-29.patch
      50 kB
      Flavio Junqueira
    8. ZOOKEEPER-29.patch
      49 kB
      Flavio Junqueira
    9. ZOOKEEPER-29.patch
      49 kB
      Flavio Junqueira
    10. ZOOKEEPER-29.patch
      44 kB
      Flavio Junqueira
    11. ZOOKEEPER-29.patch
      43 kB
      Flavio Junqueira
    12. ZOOKEEPER-29.patch
      44 kB
      Flavio Junqueira
    13. ZOOKEEPER-29.patch
      32 kB
      Flavio Junqueira
    14. ZOOKEEPER-29.patch
      20 kB
      Flavio Junqueira
    15. ZOOKEEPER-29.patch
      10 kB
      Flavio Junqueira
    16. ZOOKEEPER-29.patch
      6 kB
      Flavio Junqueira

      Activity

      Hide
      Flavio Junqueira added a comment -

      Currently quorums are majorities of processes. There are some cases, however, in which a system designer might want more flexibility to specify quorums. For example, applications running across data centers or even inside a data center when servers are spread across racks or simply different subnets. In such cases, we might want to perhaps split the set of servers into sub-groups and assign weights to the sub-groups.

      The main property that a majority of servers satisfy that matters for the correctness of ZooKeeper is the intersection property of such subsets. As long as subsets of servers intersect, they can be used as a quorum system for ZooKeeper.

      In the current code, whenever we need a quorum, we just check if the number is greater than n/2, where n is the total number of servers configured. My initial proposal is to move such a computation to a separate class, call it QuorumSpec for now, that reads a quorum configuration from a configuration file, and provides a method to check if a set of elements contains a quorum, call it "containsQuorum". For example, if we want to check whether a set of votes when electing a leader contains a quorum, then we invoke "containsQuorum" on an instance of "QuorumSpec".

      With respect to configuration, the most flexible way would be to enumerate all possible subsets. Such an option is tedious and prone to configuration errors, though. Instead, I thought of providing for splitting the servers into groups, and assigning weights to the groups. The semantics would the following. We have a quorum if we have votes from a subset of groups such that the sum of the weights of such groups is more than half of the total sum of weights. Also, we include a group in the previous computation only if we have a majority of votes from such a group.

      To illustrate, consider the following example. Suppose that we have 4 groups of servers with weights in parenthesis: A (2), B (1), C (1), and D (1). Now suppose that we have 3 servers in each group. We have a quorum if, for example, we have votes from two processes from each one of A and B. As another example, we have a quorum if we have two votes from processes in each one of B, C, and D.

      Show
      Flavio Junqueira added a comment - Currently quorums are majorities of processes. There are some cases, however, in which a system designer might want more flexibility to specify quorums. For example, applications running across data centers or even inside a data center when servers are spread across racks or simply different subnets. In such cases, we might want to perhaps split the set of servers into sub-groups and assign weights to the sub-groups. The main property that a majority of servers satisfy that matters for the correctness of ZooKeeper is the intersection property of such subsets. As long as subsets of servers intersect, they can be used as a quorum system for ZooKeeper. In the current code, whenever we need a quorum, we just check if the number is greater than n/2, where n is the total number of servers configured. My initial proposal is to move such a computation to a separate class, call it QuorumSpec for now, that reads a quorum configuration from a configuration file, and provides a method to check if a set of elements contains a quorum, call it "containsQuorum". For example, if we want to check whether a set of votes when electing a leader contains a quorum, then we invoke "containsQuorum" on an instance of "QuorumSpec". With respect to configuration, the most flexible way would be to enumerate all possible subsets. Such an option is tedious and prone to configuration errors, though. Instead, I thought of providing for splitting the servers into groups, and assigning weights to the groups. The semantics would the following. We have a quorum if we have votes from a subset of groups such that the sum of the weights of such groups is more than half of the total sum of weights. Also, we include a group in the previous computation only if we have a majority of votes from such a group. To illustrate, consider the following example. Suppose that we have 4 groups of servers with weights in parenthesis: A (2), B (1), C (1), and D (1). Now suppose that we have 3 servers in each group. We have a quorum if, for example, we have votes from two processes from each one of A and B. As another example, we have a quorum if we have two votes from processes in each one of B, C, and D.
      Hide
      Flavio Junqueira added a comment -

      This is a preliminary patch. It is not functional, but it gives an idea of what I'm trying to do, so please comment if you have a chance to check it out.

      Show
      Flavio Junqueira added a comment - This is a preliminary patch. It is not functional, but it gives an idea of what I'm trying to do, so please comment if you have a chance to check it out.
      Hide
      Flavio Junqueira added a comment -

      This new patch is partially functional. FastLeaderElection and Leader use QuorumConfigMaj.

      It required a change to the server protocol because the leader currently doesn't have the identifiers of followers, which will be necessary when I add functionality to read quorum configuration from a file.

      Show
      Flavio Junqueira added a comment - This new patch is partially functional. FastLeaderElection and Leader use QuorumConfigMaj. It required a change to the server protocol because the leader currently doesn't have the identifiers of followers, which will be necessary when I add functionality to read quorum configuration from a file.
      Hide
      Mahadev konar added a comment -

      flavio,
      i just took a cursory look at the patch... i saw that you added an interface for config and another configuration QuorumConfigMaj. I was thinking of getting rid of all the config stuff and putting it into just a single serverconfig or something. Would it be possible for you to get rid of the these new configs and use the other configs already existing?

      Show
      Mahadev konar added a comment - flavio, i just took a cursory look at the patch... i saw that you added an interface for config and another configuration QuorumConfigMaj. I was thinking of getting rid of all the config stuff and putting it into just a single serverconfig or something. Would it be possible for you to get rid of the these new configs and use the other configs already existing?
      Hide
      Flavio Junqueira added a comment -

      Mahadev, I'm ok with moving the interface and the implementations of the interface elsewhere, but I'm not sure I understand your proposal. On my side, I need an interface to represent any implementation that takes a set of servers and verifies if that set is a quorum or not. I currently have two implementations: one for majority and one for loading a quorum system from a file in the way described above. The implementation of the second one is not complete yet. In fact, I haven't implemented most of it.

      I find mixing this quorum representation with the current configuration a little messy, especially because it is an optional feature and it does correspond to a simple value loaded from a file. We need some logic to perform quorum verification aside from values that we load from a separate configuration file.

      Show
      Flavio Junqueira added a comment - Mahadev, I'm ok with moving the interface and the implementations of the interface elsewhere, but I'm not sure I understand your proposal. On my side, I need an interface to represent any implementation that takes a set of servers and verifies if that set is a quorum or not. I currently have two implementations: one for majority and one for loading a quorum system from a file in the way described above. The implementation of the second one is not complete yet. In fact, I haven't implemented most of it. I find mixing this quorum representation with the current configuration a little messy, especially because it is an optional feature and it does correspond to a simple value loaded from a file. We need some logic to perform quorum verification aside from values that we load from a separate configuration file.
      Hide
      Stu Hood added a comment -

      Two comments:


      This ticket seems to overlap with ZOOKEEPER-368 to some degree: wouldn't an Observer simply be a server in the quorum that wouldn't get to vote on proposed values?


      I find mixing this quorum representation with the current configuration a little messy, especially because it is an optional feature and it does correspond to a simple value loaded from a file. We need some logic to perform quorum verification aside from values that we load from a separate configuration file.

      It could probably correspond to a key loaded from a flat file, if the key was hierarchial. It would require a bit of a config overhaul: rather than

      server.1=ahost:2181:2888
      

      .. you'd want

      server.1.hostname=ahost
      server.1.port=2181
      server.1.otherport=2888
      server.1.weight=3
      

      ...so then weight is optional, which defaults to an equal weight for all servers.

      Show
      Stu Hood added a comment - Two comments: This ticket seems to overlap with ZOOKEEPER-368 to some degree: wouldn't an Observer simply be a server in the quorum that wouldn't get to vote on proposed values? I find mixing this quorum representation with the current configuration a little messy, especially because it is an optional feature and it does correspond to a simple value loaded from a file. We need some logic to perform quorum verification aside from values that we load from a separate configuration file. It could probably correspond to a key loaded from a flat file, if the key was hierarchial. It would require a bit of a config overhaul: rather than server.1=ahost:2181:2888 .. you'd want server.1.hostname=ahost server.1.port=2181 server.1.otherport=2888 server.1.weight=3 ...so then weight is optional, which defaults to an equal weight for all servers.
      Hide
      Flavio Junqueira added a comment -

      Thanks for you comments, Stu. The features of ZOOKEEPER-368 and this one are different. You are right in that one way of implementing observers is to assign a weight of 0, assuming that we have a weight scheme as you propose. However, my current preference is to have observers not voting at all and being able to connect to followers.

      Your proposal for adding weights is certainly an option. However, if you read my initial proposal, I proposed something slightly more elaborate than the scheme you describe. And, because there are potentially several different ways we can express quorums, I thought it would be good to have a separate file describing a quorum system to leave it open to multiple implementations. We can then write different parsers and even having them co-existing, using a flag to select which one to use.

      Of course, we can mix all this with the current server configuration, but I think that setting the parameters of a server and declaring quorums are different concerns and should be in separate files. I understand that this is a matter of preference, though.

      Show
      Flavio Junqueira added a comment - Thanks for you comments, Stu. The features of ZOOKEEPER-368 and this one are different. You are right in that one way of implementing observers is to assign a weight of 0, assuming that we have a weight scheme as you propose. However, my current preference is to have observers not voting at all and being able to connect to followers. Your proposal for adding weights is certainly an option. However, if you read my initial proposal, I proposed something slightly more elaborate than the scheme you describe. And, because there are potentially several different ways we can express quorums, I thought it would be good to have a separate file describing a quorum system to leave it open to multiple implementations. We can then write different parsers and even having them co-existing, using a flag to select which one to use. Of course, we can mix all this with the current server configuration, but I think that setting the parameters of a server and declaring quorums are different concerns and should be in separate files. I understand that this is a matter of preference, though.
      Hide
      Mahadev konar added a comment -

      flavio,
      can you elaborate on what else do you have in mind other than using weights for which the parsing/config interface would be useful? It would be great to list some examples on this jira.

      Show
      Mahadev konar added a comment - flavio, can you elaborate on what else do you have in mind other than using weights for which the parsing/config interface would be useful? It would be great to list some examples on this jira.
      Hide
      Flavio Junqueira added a comment -

      This patch implements most of what I had in mind. I have changed the names of the classes I already had previously in an attempt to make more clear what they do. Also, I haven't tested extensively, and I need to write a unit test.

      After working out the details of the hierarchical construction I wanted to implement. I realized that perhaps you guys are right in that I should try to read the quorum description from QuorumPeerConfig because I ended up duplicating some code. In any case, check it out and give me feedback.

      Show
      Flavio Junqueira added a comment - This patch implements most of what I had in mind. I have changed the names of the classes I already had previously in an attempt to make more clear what they do. Also, I haven't tested extensively, and I need to write a unit test. After working out the details of the hierarchical construction I wanted to implement. I realized that perhaps you guys are right in that I should try to read the quorum description from QuorumPeerConfig because I ended up duplicating some code. In any case, check it out and give me feedback.
      Hide
      Flavio Junqueira added a comment -

      Here is a simple example of a hierarchical quorum construction (this is an example of a configuration file as it is currently implemented):

      group.1=1:2:3
      group.2=4:5:6
      group.3=7:8:9
      
      weight.1=1
      weight.2=1
      weight.3=1
      weight.4=1
      weight.5=1
      weight.6=1
      weight.7=1
      weight.8=1
      weight.9=1
      
      Show
      Flavio Junqueira added a comment - Here is a simple example of a hierarchical quorum construction (this is an example of a configuration file as it is currently implemented): group.1=1:2:3 group.2=4:5:6 group.3=7:8:9 weight.1=1 weight.2=1 weight.3=1 weight.4=1 weight.5=1 weight.6=1 weight.7=1 weight.8=1 weight.9=1
      Hide
      Flavio Junqueira added a comment -

      I have written a unit test and changed the implementation to read the quorum configuration from QuorumPeerConfig as all the other parameters.

      Show
      Flavio Junqueira added a comment - I have written a unit test and changed the implementation to read the quorum configuration from QuorumPeerConfig as all the other parameters.
      Hide
      Flavio Junqueira added a comment -

      I'd like to replace QuorumValidator with QuorumVerifier. It is just a small modification.

      Show
      Flavio Junqueira added a comment - I'd like to replace QuorumValidator with QuorumVerifier. It is just a small modification.
      Hide
      Flavio Junqueira added a comment -

      Uploading a new patch with the name of the interface changed.

      Show
      Flavio Junqueira added a comment - Uploading a new patch with the name of the interface changed.
      Hide
      Mahadev konar added a comment -

      some questions on the patch

      • is it backwards compatible in the sense that if I do not specify groups and weights in the config then it will work as it used to?
      • also in the example that you mentioned above shouldnt the group size be atleast 5 since you have equal weights and 9 servers?
      Show
      Mahadev konar added a comment - some questions on the patch is it backwards compatible in the sense that if I do not specify groups and weights in the config then it will work as it used to? also in the example that you mentioned above shouldnt the group size be atleast 5 since you have equal weights and 9 servers?
      Hide
      Mahadev konar added a comment -

      also,

      • why not just have groups and equal weights for first cut? wouldnt that suffice for the scenario of co located zookeeper servers?
      Show
      Mahadev konar added a comment - also, why not just have groups and equal weights for first cut? wouldnt that suffice for the scenario of co located zookeeper servers?
      Hide
      Flavio Junqueira added a comment -

      Mahadev, thanks for checking it out. Here are answers to your questions:

      1- With this patch, it defaults to majority if there is no declaration of groups and weights. We currently use majority quorums;
      2- No, the example is fine. You have a quorum if you have at least two servers from two different groups;
      3- I don't see a reason for having groups in a first cut and weights in a second. In fact, one typical way of specifying quorums is to assign weights, so even if a user is not interested in groups, the user can declare one group and assign weights.

      Show
      Flavio Junqueira added a comment - Mahadev, thanks for checking it out. Here are answers to your questions: 1- With this patch, it defaults to majority if there is no declaration of groups and weights. We currently use majority quorums; 2- No, the example is fine. You have a quorum if you have at least two servers from two different groups; 3- I don't see a reason for having groups in a first cut and weights in a second. In fact, one typical way of specifying quorums is to assign weights, so even if a user is not interested in groups, the user can declare one group and assign weights.
      Hide
      Flavio Junqueira added a comment -

      Just to motivate a little more this scheme, taking the example above, suppose that each group corresponds to the set of ZooKeeper servers in a colocation. To form a quorum, we just need to get a majority of servers from a majority of groups. This gives us quorums of size 4, whereas majority quorums would require 5. This model is reasonable, though, as it is reasonable to expect that at least a majority of colocations have a majority of servers up and running.

      This idea actually appeared in the following citation (apologies for the self-reference): http://dslab.epfl.ch/hotdep/2005/papers/junqueira_dependent.pdf

      Another question that may arise is how leader election changes. With this scheme it only changes minimally. In the current trunk code, we always wait for a majority of votes, and that's because we have majority quorums hardcoded. With this patch, we check if we have a quorum, and a quorum can be a majority, as we have currently, or a different quorum system as the one of the example.

      Show
      Flavio Junqueira added a comment - Just to motivate a little more this scheme, taking the example above, suppose that each group corresponds to the set of ZooKeeper servers in a colocation. To form a quorum, we just need to get a majority of servers from a majority of groups. This gives us quorums of size 4, whereas majority quorums would require 5. This model is reasonable, though, as it is reasonable to expect that at least a majority of colocations have a majority of servers up and running. This idea actually appeared in the following citation (apologies for the self-reference): http://dslab.epfl.ch/hotdep/2005/papers/junqueira_dependent.pdf Another question that may arise is how leader election changes. With this scheme it only changes minimally. In the current trunk code, we always wait for a majority of votes, and that's because we have majority quorums hardcoded. With this patch, we check if we have a quorum, and a quorum can be a majority, as we have currently, or a different quorum system as the one of the example.
      Hide
      Benjamin Reed added a comment -

      this looks really good.

      you build an ackToString String, but don't use it. it's probably a left over debugging.

      i'm wondering if it is a good idea to let the QuorumVerifier to be changed while the service is up. perhaps we should throw an exception in that case.

      could you document the configuration of the hierarchical QuorumVerifier? it would also be good to expand the description in the javadoc at the top of QuorumHierarchical?

      Show
      Benjamin Reed added a comment - this looks really good. you build an ackToString String, but don't use it. it's probably a left over debugging. i'm wondering if it is a good idea to let the QuorumVerifier to be changed while the service is up. perhaps we should throw an exception in that case. could you document the configuration of the hierarchical QuorumVerifier? it would also be good to expand the description in the javadoc at the top of QuorumHierarchical?
      Hide
      Flavio Junqueira added a comment -

      Canceling patch to implement suggestions.

      Show
      Flavio Junqueira added a comment - Canceling patch to implement suggestions.
      Hide
      Flavio Junqueira added a comment -

      New patch with comments addressed.

      Show
      Flavio Junqueira added a comment - New patch with comments addressed.
      Hide
      Flavio Junqueira added a comment -

      > you build an ackToString String, but don't use it. it's probably a left over debugging.

      I forgot to add to add the string to the message following it. I have fixed it in the new patch.

      > i'm wondering if it is a good idea to let the QuorumVerifier to be changed while the service
      > is up. perhaps we should throw an exception in that case.

      If I interpret your comment correctly, you're concerned about calls to QuorumPeer::setQuorumVerifier after initialization. As far as I can tell, we have the same issue with other variables. For example, if we change the quorum port on-the-fly, we might get into trouble. Actually, changing the QuorumVerifier on-the-fly doesn't seem to cause a server to stop operating. The only issue I can see is that a server may have a different notion of quorums, but I think this can happen even if we enforce that we set the QuorumVerifier only during initialization. We guarantee that servers have consistent configuration files, right?

      If you think this is a general issue with initialization, we should perhaps open a different jira and address this problem there?

      > could you document the configuration of the hierarchical QuorumVerifier? it would also be
      > good to expand the description in the javadoc at the top of QuorumHierarchical?

      Done. Let me know if you think it is sufficient.

      Show
      Flavio Junqueira added a comment - > you build an ackToString String, but don't use it. it's probably a left over debugging. I forgot to add to add the string to the message following it. I have fixed it in the new patch. > i'm wondering if it is a good idea to let the QuorumVerifier to be changed while the service > is up. perhaps we should throw an exception in that case. If I interpret your comment correctly, you're concerned about calls to QuorumPeer::setQuorumVerifier after initialization. As far as I can tell, we have the same issue with other variables. For example, if we change the quorum port on-the-fly, we might get into trouble. Actually, changing the QuorumVerifier on-the-fly doesn't seem to cause a server to stop operating. The only issue I can see is that a server may have a different notion of quorums, but I think this can happen even if we enforce that we set the QuorumVerifier only during initialization. We guarantee that servers have consistent configuration files, right? If you think this is a general issue with initialization, we should perhaps open a different jira and address this problem there? > could you document the configuration of the hierarchical QuorumVerifier? it would also be > good to expand the description in the javadoc at the top of QuorumHierarchical? Done. Let me know if you think it is sufficient.
      Hide
      Benjamin Reed added a comment -

      +1 looks good. marking patch available so that qa can run against it.

      Show
      Benjamin Reed added a comment - +1 looks good. marking patch available so that qa can run against it.
      Hide
      Mahadev konar added a comment - - edited

      sorry to be commenting late on the patch flavio,

      • minor nit - there is an unneccesary change in AuthFastLeaderElection.java
      • I see changes to FastLeaderElection but no changes in LeaderElection. Shouldnt both of them change.
      • also this breaks backward comtability with servers, should this go into 4.0? or is 3.2 fine?
      • isnt the sid sent only once so why do we have that in a loop -
             break;
                      case Leader.SID:
                          bb = ByteBuffer.wrap(qp.getData());
                          this.sid = bb.getLong();
                          
                          break;  
      

      do we expect the followers to keep sending the SID?

      • I think this code is wrong in Leader.java (line 297 or so)
       HashSet<Long> followerSet = new HashSet<Long>();
                          for(FollowerHandler f : followers)
                              followerSet.add(f.getSid());
                          
                          if (self.getQuorumVerifier().containsQuorum(followerSet)) {
                          //if (followers.size() >= self.quorumPeers.size() / 2) {
                              LOG.warn("Enough followers present. "+
                                      "Perhaps the init 
       
      

      The reason being the sid is always set to 0 even if the follower hasnt been able to be synced with the leader
      and sent back the sid – which is the above case. So mostly the above statement would never execute since
      the sid is only sent after the follower synchronizes the database. Meaning that your code would not be able
      to estimate if you had the right set of followers with the SID. Why isnt the SID sent as the first packet when
      syncing up with the leader as a payload? Why do we need another protocol change?

      • also it would be good to have checking in groups and weights (mostly for mistakes that people would make). You can see
        that it gets really easy to make mistakes with this configuration pattern. We should have more error checking (like serverid belongs
        to some unique group) and also in cases like
        if((serverWeight.size() > 0) &&
                            (serverGroup.size() > 0) && 
                            (servers.size() == serverGroup.size())){
        

      where only one or two of the above statements are true, we should fail saying that some problem on the config files rather than
      resorting to majority config. We can do it in seperate jira.

      • sorry for being naive, you mentioned in the comments that "This gives us quorums of size 4, whereas majority quorums would require 5." . What are
        the implications of this? Meaning is their anycase where the servers can diverge on the databases?
      Show
      Mahadev konar added a comment - - edited sorry to be commenting late on the patch flavio, minor nit - there is an unneccesary change in AuthFastLeaderElection.java I see changes to FastLeaderElection but no changes in LeaderElection. Shouldnt both of them change. also this breaks backward comtability with servers, should this go into 4.0? or is 3.2 fine? isnt the sid sent only once so why do we have that in a loop - break ; case Leader.SID: bb = ByteBuffer.wrap(qp.getData()); this .sid = bb.getLong(); break ; do we expect the followers to keep sending the SID? I think this code is wrong in Leader.java (line 297 or so) HashSet< Long > followerSet = new HashSet< Long >(); for (FollowerHandler f : followers) followerSet.add(f.getSid()); if (self.getQuorumVerifier().containsQuorum(followerSet)) { // if (followers.size() >= self.quorumPeers.size() / 2) { LOG.warn( "Enough followers present. " + "Perhaps the init The reason being the sid is always set to 0 even if the follower hasnt been able to be synced with the leader and sent back the sid – which is the above case. So mostly the above statement would never execute since the sid is only sent after the follower synchronizes the database. Meaning that your code would not be able to estimate if you had the right set of followers with the SID. Why isnt the SID sent as the first packet when syncing up with the leader as a payload? Why do we need another protocol change? also it would be good to have checking in groups and weights (mostly for mistakes that people would make). You can see that it gets really easy to make mistakes with this configuration pattern. We should have more error checking (like serverid belongs to some unique group) and also in cases like if ((serverWeight.size() > 0) && (serverGroup.size() > 0) && (servers.size() == serverGroup.size())){ where only one or two of the above statements are true, we should fail saying that some problem on the config files rather than resorting to majority config. We can do it in seperate jira. sorry for being naive, you mentioned in the comments that "This gives us quorums of size 4, whereas majority quorums would require 5." . What are the implications of this? Meaning is their anycase where the servers can diverge on the databases?
      Hide
      Mahadev konar added a comment -

      also we should have (forrest) documentation for the new configuration style for groups and servers.

      Show
      Mahadev konar added a comment - also we should have (forrest) documentation for the new configuration style for groups and servers.
      Hide
      Flavio Junqueira added a comment -

      I forgot to say that with the new patch, we assign a weight of 1 by default to servers. That is, if a user defines groups, but not the weights, then servers will get a weight of 1.

      Show
      Flavio Junqueira added a comment - I forgot to say that with the new patch, we assign a weight of 1 by default to servers. That is, if a user defines groups, but not the weights, then servers will get a weight of 1.
      Hide
      Mahadev konar added a comment -

      I am +1 on putting in 3.2 since it does not break any compatibility with database and clients (only that 3.1.1 servers cannot run with 3.2 servers) which I think is fine.

      Show
      Mahadev konar added a comment - I am +1 on putting in 3.2 since it does not break any compatibility with database and clients (only that 3.1.1 servers cannot run with 3.2 servers) which I think is fine.
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12405640/ZOOKEEPER-29.patch
      against trunk revision 765797.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 21 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      -1 findbugs. The patch appears to introduce 3 new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      -1 core tests. The patch failed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/36/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/36/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/36/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12405640/ZOOKEEPER-29.patch against trunk revision 765797. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 3 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/36/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/36/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/36/console This message is automatically generated.
      Hide
      Flavio Junqueira added a comment -

      Thanks for your comments, Mahadev. Here are my response based on our discussion:

      1- Using this new feature with FastLeaderElection is easier because the protocol already exchanges server ids. It will require more modifications to move LeaderElection and AuthFastLeaderElection, so I propose to work on it in a different jira. If a user decides to use one of these implementations, then only majority quorums will be available;
      2- Mahadev suggested to change the LASTZXID message to FOLLOWERINFO and send both the lastzxid and the server id in the same message. The main reason for doing this is to avoid creating a new message every time we have to send a new piece of information. I think it is a good suggestion, so I'll implement it in the next patch;
      3- The SID case is in the loop just in case we receive a stray message. It shouldn't happen with this patch, so I'm ok with removing it. In fact, with modification of 2, that case will have to go anyway;
      4- We agreed that the problem pointed out in Leader.java:297 is not really a problem because SID is the first message sent. The same will happen with the modification of 2;
      5- Mahadev suggests that instead of falling back to majority if there is a problem with the group configuration, we should throw an exception. I'm ok with that suggestion;
      6- Although it may seem that using quorums of size 4 with 9 replicas may cause replicas to diverge, it is simple exercise to verify that all quorums constructed hierarchically in the way proposed are guaranteed to intersect. All we need to guarantee that replicas won't diverge is that quorums intersect. Majority quorums trivially intersect;
      7- I don't think the failure in the test is due to this patch. I looked over the results and it is due to testRecovery failing due to "unreasonable length". Isn't it the issue of ZOOKEEPER-367?

      Show
      Flavio Junqueira added a comment - Thanks for your comments, Mahadev. Here are my response based on our discussion: 1- Using this new feature with FastLeaderElection is easier because the protocol already exchanges server ids. It will require more modifications to move LeaderElection and AuthFastLeaderElection, so I propose to work on it in a different jira. If a user decides to use one of these implementations, then only majority quorums will be available; 2- Mahadev suggested to change the LASTZXID message to FOLLOWERINFO and send both the lastzxid and the server id in the same message. The main reason for doing this is to avoid creating a new message every time we have to send a new piece of information. I think it is a good suggestion, so I'll implement it in the next patch; 3- The SID case is in the loop just in case we receive a stray message. It shouldn't happen with this patch, so I'm ok with removing it. In fact, with modification of 2, that case will have to go anyway; 4- We agreed that the problem pointed out in Leader.java:297 is not really a problem because SID is the first message sent. The same will happen with the modification of 2; 5- Mahadev suggests that instead of falling back to majority if there is a problem with the group configuration, we should throw an exception. I'm ok with that suggestion; 6- Although it may seem that using quorums of size 4 with 9 replicas may cause replicas to diverge, it is simple exercise to verify that all quorums constructed hierarchically in the way proposed are guaranteed to intersect. All we need to guarantee that replicas won't diverge is that quorums intersect. Majority quorums trivially intersect; 7- I don't think the failure in the test is due to this patch. I looked over the results and it is due to testRecovery failing due to "unreasonable length". Isn't it the issue of ZOOKEEPER-367 ?
      Hide
      Flavio Junqueira added a comment -

      Re-submitting patch with suggested modifications.

      Show
      Flavio Junqueira added a comment - Re-submitting patch with suggested modifications.
      Hide
      Flavio Junqueira added a comment -

      Fixing the problems pointed by findbugs.

      Show
      Flavio Junqueira added a comment - Fixing the problems pointed by findbugs.
      Hide
      Benjamin Reed added a comment -

      i like the idea of doing the FOLLOWERINFO and adding the server id in there.

      Show
      Benjamin Reed added a comment - i like the idea of doing the FOLLOWERINFO and adding the server id in there.
      Hide
      Hadoop QA added a comment -

      +1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12405754/ZOOKEEPER-29.patch
      against trunk revision 766107.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 22 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/39/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/39/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/39/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12405754/ZOOKEEPER-29.patch against trunk revision 766107. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 22 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/39/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/39/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/39/console This message is automatically generated.
      Hide
      Mahadev konar added a comment -

      their is a very easy way to make this backwards compatible after we moved to the payload design. The payload would only be added if the server has itself configured to be group *. This way the new server can talk to the old servers and their wont be any opposition to getting this into 3.2 (instead of 4)...

      Show
      Mahadev konar added a comment - their is a very easy way to make this backwards compatible after we moved to the payload design. The payload would only be added if the server has itself configured to be group *. This way the new server can talk to the old servers and their wont be any opposition to getting this into 3.2 (instead of 4)...
      Hide
      Mahadev konar added a comment - - edited

      i realized that it wont be easy becasue the quorum verifuer and other would still need SID to function right..

      Show
      Mahadev konar added a comment - - edited i realized that it wont be easy becasue the quorum verifuer and other would still need SID to function right..
      Hide
      Patrick Hunt added a comment -

      Just a reminder, non-backward compatible changes will have to wait for 4.0. It would be great to figure out a way to support this for 3.2.

      Show
      Patrick Hunt added a comment - Just a reminder, non-backward compatible changes will have to wait for 4.0. It would be great to figure out a way to support this for 3.2.
      Hide
      Hadoop QA added a comment -

      +1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12405754/ZOOKEEPER-29.patch
      against trunk revision 766160.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 22 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/40/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/40/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/40/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12405754/ZOOKEEPER-29.patch against trunk revision 766160. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 22 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/40/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/40/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/40/console This message is automatically generated.
      Hide
      Flavio Junqueira added a comment -

      I made an attempt at making the protocol backward compatible. Basically, a FollowerHandler is able to interpret both FOLLOWERINFO message and LASTZXID. If it receives a LASTZXID message, it generates a unique id number that does not correspond to the server id, but ti doesn't matter if we are using majority because for majority we only need to count votes. Of course, if a user tries to use hierarchical quorums with servers running older versions, it won't work.

      Show
      Flavio Junqueira added a comment - I made an attempt at making the protocol backward compatible. Basically, a FollowerHandler is able to interpret both FOLLOWERINFO message and LASTZXID. If it receives a LASTZXID message, it generates a unique id number that does not correspond to the server id, but ti doesn't matter if we are using majority because for majority we only need to count votes. Of course, if a user tries to use hierarchical quorums with servers running older versions, it won't work.
      Hide
      Flavio Junqueira added a comment -

      I have realized that with the previous patch it wouldn't be backward compatible because a new follower would never send a LASTZXID, and an old leader wouldn't understand a FOLLOWERINFO message. This new patch gets rid of FOLLOWERINFO and only uses LASTZXID, and it seems to work fine.

      My preference is to stick with LASTZXID for now, if keeping it backward compatible is really necessary, and consider updating it for 4.0.

      Show
      Flavio Junqueira added a comment - I have realized that with the previous patch it wouldn't be backward compatible because a new follower would never send a LASTZXID, and an old leader wouldn't understand a FOLLOWERINFO message. This new patch gets rid of FOLLOWERINFO and only uses LASTZXID, and it seems to work fine. My preference is to stick with LASTZXID for now, if keeping it backward compatible is really necessary, and consider updating it for 4.0.
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12405927/ZOOKEEPER-29.patch
      against trunk revision 766160.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 21 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      -1 findbugs. The patch appears to cause Findbugs to fail.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      -1 core tests. The patch failed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/42/testReport/
      Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/42/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12405927/ZOOKEEPER-29.patch against trunk revision 766160. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/42/testReport/ Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/42/console This message is automatically generated.
      Hide
      Flavio Junqueira added a comment -

      The previous patch didn't compile properly. I have fixed the problem and I'm submitting again.

      Show
      Flavio Junqueira added a comment - The previous patch didn't compile properly. I have fixed the problem and I'm submitting again.
      Hide
      Mahadev konar added a comment -

      I have realized that with the previous patch it wouldn't be backward compatible because a new follower would never send a LASTZXID, and an old leader wouldn't understand a FOLLOWERINFO message. This new patch gets rid of FOLLOWERINFO and only uses LASTZXID, and it seems to work fine.
      My preference is to stick with LASTZXID for now, if keeping it backward compatible is really necessary, and consider updating it for 4.0.

      great... this is what I was thinking and was going to propose.. I dont see any problems with changing it to FOLLOWERINFO in this patch (but its up to you)...

      Show
      Mahadev konar added a comment - I have realized that with the previous patch it wouldn't be backward compatible because a new follower would never send a LASTZXID, and an old leader wouldn't understand a FOLLOWERINFO message. This new patch gets rid of FOLLOWERINFO and only uses LASTZXID, and it seems to work fine. My preference is to stick with LASTZXID for now, if keeping it backward compatible is really necessary, and consider updating it for 4.0. great... this is what I was thinking and was going to propose.. I dont see any problems with changing it to FOLLOWERINFO in this patch (but its up to you)...
      Hide
      Mahadev konar added a comment -

      by chaning it to FOLLOWERINFO I meant just changing the name from LASTZXID to FOLLOWEINFO (keepeing the int value). So its just a name change for code readability and nothing else.

      Show
      Mahadev konar added a comment - by chaning it to FOLLOWERINFO I meant just changing the name from LASTZXID to FOLLOWEINFO (keepeing the int value). So its just a name change for code readability and nothing else.
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12405929/ZOOKEEPER-29.patch
      against trunk revision 766897.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 21 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      -1 core tests. The patch failed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/44/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/44/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/44/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12405929/ZOOKEEPER-29.patch against trunk revision 766897. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/44/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/44/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/44/console This message is automatically generated.
      Hide
      Flavio Junqueira added a comment -

      I'm replacing LASTZXID with FOLLOWERINFO. Also, I checked the results of QA and it seems to be failing on a C client test, which is unrelated to this issue. In fact, when I follow the link to the test results, it says that no test has failed, but if you look at the console output there is a failure for one of the C tests:

      [exec]      [exec] Zookeeper_simpleSystem::testAsyncWatcherAutoReset : assertion
           [exec]      [exec] 
           [exec]      [exec] /home/hudson/hudson-slave/workspace/Zookeeper-Patch-vesta.apache.org/trunk/src/c/tests/TestClient.cc:273: Assertion: equality assertion failed [Expected: -101, Actual  : 0]
           [exec]      [exec] Failures !!!
           [exec]      [exec] Run: 32   Failure total: 1   Failures: 1   Errors: 0
      
      Show
      Flavio Junqueira added a comment - I'm replacing LASTZXID with FOLLOWERINFO. Also, I checked the results of QA and it seems to be failing on a C client test, which is unrelated to this issue. In fact, when I follow the link to the test results, it says that no test has failed, but if you look at the console output there is a failure for one of the C tests: [exec] [exec] Zookeeper_simpleSystem::testAsyncWatcherAutoReset : assertion [exec] [exec] [exec] [exec] /home/hudson/hudson-slave/workspace/Zookeeper-Patch-vesta.apache.org/trunk/src/c/tests/TestClient.cc:273: Assertion: equality assertion failed [Expected: -101, Actual : 0] [exec] [exec] Failures !!! [exec] [exec] Run: 32 Failure total: 1 Failures: 1 Errors: 0
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12406013/ZOOKEEPER-29.patch
      against trunk revision 766897.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 22 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      -1 core tests. The patch failed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/45/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/45/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/45/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12406013/ZOOKEEPER-29.patch against trunk revision 766897. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 22 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/45/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/45/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/45/console This message is automatically generated.
      Hide
      Flavio Junqueira added a comment -

      There is something strange with the test result. If we check the console output of test number 45 (the latest one for this patch), we find:

      [exec]      [exec] Zookeeper_simpleSystem::testAsyncWatcherAutoReset : assertion
           [exec]      [exec] 
           [exec]      [exec] /home/hudson/hudson-slave/workspace/Zookeeper-Patch-vesta.apache.org/trunk/src/c/tests/TestClient.cc:273: Assertion: equality assertion failed [Expected: -101, Actual  : 0]
           [exec]      [exec] Failures !!!
           [exec]      [exec] Run: 32   Failure total: 1   Failures: 1   Errors: 0
           [exec]      [exec] make: *** [run-check] Error 1
           [exec] 
           [exec] BUILD FAILED
           [exec] /home/hudson/hudson-slave/workspace/Zookeeper-Patch-vesta.apache.org/trunk/build.xml:612: The following error occurred while executing this line:
           [exec] /home/hudson/hudson-slave/workspace/Zookeeper-Patch-vesta.apache.org/trunk/build.xml:622: exec returned: 2
           [exec] 
           [exec] Total time: 8 minutes 52 seconds
      

      This same failure also appears in the trunk test, but it doesn't cause the whole check to fail:

      [exec] /home/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/src/c/tests/TestClient.cc:273: Assertion: equality assertion failed [Expected: -101, Actual  : 0]
           [exec] Failures !!!
           [exec] Run: 32   Failure total: 1   Failures: 1   Errors: 0
           [exec] make: *** [run-check] Error 1
           [exec] Result: 2
      
      test-core-cppunit:
      
      test-core:
      
      test-contrib:
      
      BUILD SUCCESSFUL
      Total time: 9 minutes 32 seconds
      Recording fingerprints
      Publishing Javadoc
      Recording test results
      finished: SUCCESS
      

      Am I reading it wrong?

      Show
      Flavio Junqueira added a comment - There is something strange with the test result. If we check the console output of test number 45 (the latest one for this patch), we find: [exec] [exec] Zookeeper_simpleSystem::testAsyncWatcherAutoReset : assertion [exec] [exec] [exec] [exec] /home/hudson/hudson-slave/workspace/Zookeeper-Patch-vesta.apache.org/trunk/src/c/tests/TestClient.cc:273: Assertion: equality assertion failed [Expected: -101, Actual : 0] [exec] [exec] Failures !!! [exec] [exec] Run: 32 Failure total: 1 Failures: 1 Errors: 0 [exec] [exec] make: *** [run-check] Error 1 [exec] [exec] BUILD FAILED [exec] /home/hudson/hudson-slave/workspace/Zookeeper-Patch-vesta.apache.org/trunk/build.xml:612: The following error occurred while executing this line: [exec] /home/hudson/hudson-slave/workspace/Zookeeper-Patch-vesta.apache.org/trunk/build.xml:622: exec returned: 2 [exec] [exec] Total time: 8 minutes 52 seconds This same failure also appears in the trunk test, but it doesn't cause the whole check to fail: [exec] /home/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/src/c/tests/TestClient.cc:273: Assertion: equality assertion failed [Expected: -101, Actual : 0] [exec] Failures !!! [exec] Run: 32 Failure total: 1 Failures: 1 Errors: 0 [exec] make: *** [run-check] Error 1 [exec] Result: 2 test-core-cppunit: test-core: test-contrib: BUILD SUCCESSFUL Total time: 9 minutes 32 seconds Recording fingerprints Publishing Javadoc Recording test results finished: SUCCESS Am I reading it wrong?
      Hide
      Patrick Hunt added a comment -

      Giri is working on getting cppunit integrated into the patch/build process on hudson.

      There was a recent change where cppunit failures used to be ignored, now they are not.
      Try pinging Giri.

      Show
      Patrick Hunt added a comment - Giri is working on getting cppunit integrated into the patch/build process on hudson. There was a recent change where cppunit failures used to be ignored, now they are not. Try pinging Giri.
      Hide
      Mahadev konar added a comment -

      this patch looks good. The only thing being -

      • QuorumPeer should have a new constructor that adds a parameter rather than replacing the old parameter for backwards compatibility

      other than that everything else looks good.

      Show
      Mahadev konar added a comment - this patch looks good. The only thing being - QuorumPeer should have a new constructor that adds a parameter rather than replacing the old parameter for backwards compatibility other than that everything else looks good.
      Hide
      Flavio Junqueira added a comment -

      Thanks for the suggestion, Mahadev. I have implemented your suggestion in this new patch. I have also done a couple of other things:

      1- I have fixed a bug on FollowerHandler. We should be checking if "qp.getData() == null" instead of checking if "qp.getData().length > 0". With the current protocol, a follower wont't send any data, and qp.getData will be null;
      2- I have removed ackCount from the Leader::Proposal class. It is not necessary given that we are keeping a set.

      All java tests pass for me.

      Show
      Flavio Junqueira added a comment - Thanks for the suggestion, Mahadev. I have implemented your suggestion in this new patch. I have also done a couple of other things: 1- I have fixed a bug on FollowerHandler. We should be checking if "qp.getData() == null" instead of checking if "qp.getData().length > 0". With the current protocol, a follower wont't send any data, and qp.getData will be null; 2- I have removed ackCount from the Leader::Proposal class. It is not necessary given that we are keeping a set. All java tests pass for me.
      Hide
      Hadoop QA added a comment -

      +1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12406121/ZOOKEEPER-29.patch
      against trunk revision 766897.

      +1 @author. The patch does not contain any @author tags.

      +1 tests included. The patch appears to include 3 new or modified tests.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 findbugs. The patch does not introduce any new Findbugs warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 core tests. The patch passed core unit tests.

      +1 contrib tests. The patch passed contrib unit tests.

      Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/47/testReport/
      Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/47/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
      Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/47/console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12406121/ZOOKEEPER-29.patch against trunk revision 766897. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/47/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/47/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-vesta.apache.org/47/console This message is automatically generated.
      Hide
      Mahadev konar added a comment -

      +1 to the patch...
      minor nits (though I took care of it)

      • avoid using tabs in forrest docs
      • run forrest before you submit doc changes

      I just committed this. thanks flavio.

      Show
      Mahadev konar added a comment - +1 to the patch... minor nits (though I took care of it) avoid using tabs in forrest docs run forrest before you submit doc changes I just committed this. thanks flavio.
      Hide
      Hudson added a comment -

      Integrated in ZooKeeper-trunk #290 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/290/)
      . Flexible quorums (flavio via mahadev)

      Show
      Hudson added a comment - Integrated in ZooKeeper-trunk #290 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/290/ ) . Flexible quorums (flavio via mahadev)

        People

        • Assignee:
          Flavio Junqueira
          Reporter:
          Patrick Hunt
        • Votes:
          0 Vote for this issue
          Watchers:
          0 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development