Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-1324

Remove Duplicate NEWLEADER packets from the Leader to the Follower.

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 3.5.0
    • Fix Version/s: 3.4.6, 3.5.0
    • Component/s: quorum
    • Labels:
      None
    1. ZOOKEEPER-1324.patch
      18 kB
      Flavio Junqueira
    2. ZOOKEEPER-1324.patch
      18 kB
      Flavio Junqueira
    3. ZOOKEEPER-1324.patch
      18 kB
      Flavio Junqueira
    4. ZOOKEEPER-1324.patch
      17 kB
      Flavio Junqueira
    5. ZOOKEEPER-1324.patch
      12 kB
      Thawan Kooburat
    6. ZOOKEEPER-1324.patch
      12 kB
      Thawan Kooburat
    7. ZOOKEEPER-1324.patch
      11 kB
      Thawan Kooburat
    8. ZOOKEEPER-1324.patch
      8 kB
      Flavio Junqueira
    9. ZOOKEEPER-1324.patch
      2 kB
      Thawan Kooburat
    10. ZOOKEEPER-1324-branch-3.4.patch
      12 kB
      Flavio Junqueira
    11. ZOOKEEPER-1324-trunk-committed.patch
      19 kB
      Flavio Junqueira

      Issue Links

        Activity

        Hide
        thawan Thawan Kooburat added a comment -

        no-whitespace-changes version of the patch (so indentation may not be correct)

        Show
        thawan Thawan Kooburat added a comment - no-whitespace-changes version of the patch (so indentation may not be correct)
        Hide
        thawan Thawan Kooburat added a comment -

        The same patch but with the whitespace changes (by Eclipse)

        Show
        thawan Thawan Kooburat added a comment - The same patch but with the whitespace changes (by Eclipse)
        Hide
        thawan Thawan Kooburat added a comment -

        I was tracking down why I saw some followers taking snapshot 2 times during quorum forming and found that this is the problem. The fix is according to what Flavio suggested in ZOOKEEPER-1319

        Show
        thawan Thawan Kooburat added a comment - I was tracking down why I saw some followers taking snapshot 2 times during quorum forming and found that this is the problem. The fix is according to what Flavio suggested in ZOOKEEPER-1319
        Hide
        thawan Thawan Kooburat added a comment -

        version with the right whitespace

        Show
        thawan Thawan Kooburat added a comment - version with the right whitespace
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12563186/ZOOKEEPER-1324.patch
        against trunk revision 1427034.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1319//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1319//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1319//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563186/ZOOKEEPER-1324.patch against trunk revision 1427034. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1319//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1319//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1319//console This message is automatically generated.
        Hide
        eribeiro Edward Ribeiro added a comment -

        Thawan, have you looked at ZOOKEEPER-1498? It seems to be a duplicate of this issue. See if it's okay to mark it as duplicate, please?

        Show
        eribeiro Edward Ribeiro added a comment - Thawan, have you looked at ZOOKEEPER-1498 ? It seems to be a duplicate of this issue. See if it's okay to mark it as duplicate, please?
        Hide
        fpj Flavio Junqueira added a comment -

        I understand I suggested this solution previously, but I find a bit hacky. Let me think some more about this and see if I can come up with something better. It would be great if we could just use the one the leader is queuing already. In the case we end up committing this patch, I suggest we add a comment explaining why we have that if statement. Otherwise, in the future, we might end up wondering what it is doing there.

        Show
        fpj Flavio Junqueira added a comment - I understand I suggested this solution previously, but I find a bit hacky. Let me think some more about this and see if I can come up with something better. It would be great if we could just use the one the leader is queuing already. In the case we end up committing this patch, I suggest we add a comment explaining why we have that if statement. Otherwise, in the future, we might end up wondering what it is doing there.
        Hide
        fpj Flavio Junqueira added a comment -

        What about this change? I still need to clean up the patch, so it is not final yet, but the change I'm proposing can be reviewed.

        Show
        fpj Flavio Junqueira added a comment - What about this change? I still need to clean up the patch, so it is not final yet, but the change I'm proposing can be reviewed.
        Hide
        thawan Thawan Kooburat added a comment -

        I understand the fact that you want to treat NEWLEADER like another proposal and reuse existing facility to handle it. However, treating this like a normal proposal also comes with some issues.
        1. Currently procesAck() have to a special logic to handle NEWLEADER proposal, it need to start up ZK server
        2. We are sending NEWLEADER proposal implicitly as part of sending outstanding proposals to the learners. Now you have to keep adding NEWLEADER back into outstanding proposals whenever a new learner joins the quorum since it is removed whenever NEWLEADER packet is committed.
        3. There is no need to for observers to see outstanding proposals. This is minor optimization, but we don't send outstanding proposals to the observer in startForwarding() in our internal branch.

        If we really want to clean this up, I suggested that we do it another way around. I think queuing NEWLEADER packet into the packet queue explicitly make the code easier to follow. Additional, we may factor out NEWLEADER to use its own method for collecting the votes similar to waitForEpochAck(), so that we don't overload processAck() and outstanding proposals to do multiple things. Let me know what you think

        Show
        thawan Thawan Kooburat added a comment - I understand the fact that you want to treat NEWLEADER like another proposal and reuse existing facility to handle it. However, treating this like a normal proposal also comes with some issues. 1. Currently procesAck() have to a special logic to handle NEWLEADER proposal, it need to start up ZK server 2. We are sending NEWLEADER proposal implicitly as part of sending outstanding proposals to the learners. Now you have to keep adding NEWLEADER back into outstanding proposals whenever a new learner joins the quorum since it is removed whenever NEWLEADER packet is committed. 3. There is no need to for observers to see outstanding proposals. This is minor optimization, but we don't send outstanding proposals to the observer in startForwarding() in our internal branch. If we really want to clean this up, I suggested that we do it another way around. I think queuing NEWLEADER packet into the packet queue explicitly make the code easier to follow. Additional, we may factor out NEWLEADER to use its own method for collecting the votes similar to waitForEpochAck(), so that we don't overload processAck() and outstanding proposals to do multiple things. Let me know what you think
        Hide
        fpj Flavio Junqueira added a comment -

        Having a different method for collecting the ack and not overloading processAck sounds great. On adding NEWLEADER to the packet queue, I need a confirmation to make sure I understand your proposal. I think you're suggesting that we replace adding a new leader proposal to outstandingProposals in #lead() with a barrier method that waits for enough acks to return, but we keep adding NEWLEADER to the queue in LearnerHandler as we have today.

        Show
        fpj Flavio Junqueira added a comment - Having a different method for collecting the ack and not overloading processAck sounds great. On adding NEWLEADER to the packet queue, I need a confirmation to make sure I understand your proposal. I think you're suggesting that we replace adding a new leader proposal to outstandingProposals in #lead() with a barrier method that waits for enough acks to return, but we keep adding NEWLEADER to the queue in LearnerHandler as we have today.
        Hide
        thawan Thawan Kooburat added a comment -

        This is the first revision of the fix. Below are some thought about the diff

        1. This change also prevent the leader from counting Observer's NEWLEADER_ACK causing the quorum to form prematurely. I will submit a unit test in a separate jira

        2. I just noticed that the learner send ACK back to leader for both NEWLEADER and UPTODATE. I don't think UPTODATE's ack is part of the protocol, but didn't want to change it

        3. After LearnerHandler call waitForNewLeaderAck(), it still wait and check if ZK server is started or not. I don't think we need that check anymore but it doesn't hurt to leave it as well.

        Show
        thawan Thawan Kooburat added a comment - This is the first revision of the fix. Below are some thought about the diff 1. This change also prevent the leader from counting Observer's NEWLEADER_ACK causing the quorum to form prematurely. I will submit a unit test in a separate jira 2. I just noticed that the learner send ACK back to leader for both NEWLEADER and UPTODATE. I don't think UPTODATE's ack is part of the protocol, but didn't want to change it 3. After LearnerHandler call waitForNewLeaderAck(), it still wait and check if ZK server is started or not. I don't think we need that check anymore but it doesn't hurt to leave it as well.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12564047/ZOOKEEPER-1324.patch
        against trunk revision 1427034.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1327//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1327//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1327//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12564047/ZOOKEEPER-1324.patch against trunk revision 1427034. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1327//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1327//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1327//console This message is automatically generated.
        Hide
        thawan Thawan Kooburat added a comment -

        Fix findbugs warning and check that NEWLEADER's ACK is from the correct epoch

        Show
        thawan Thawan Kooburat added a comment - Fix findbugs warning and check that NEWLEADER's ACK is from the correct epoch
        Hide
        fpj Flavio Junqueira added a comment -

        It looks pretty good, Thawan, thanks for coming up with it fast.

        I was planning on removing that ack in learner handler after receiving UPTODATE. I believe it is there for backward compatibility with the 3.3 branch.

        I was also wondering about the synchronization of startZKServer. That whole block of code was previously synchronized in processAck, but if I'm reading it right only lastCommitted can have concurrent accesses in that block, is it right?

        Show
        fpj Flavio Junqueira added a comment - It looks pretty good, Thawan, thanks for coming up with it fast. I was planning on removing that ack in learner handler after receiving UPTODATE. I believe it is there for backward compatibility with the 3.3 branch. I was also wondering about the synchronization of startZKServer. That whole block of code was previously synchronized in processAck, but if I'm reading it right only lastCommitted can have concurrent accesses in that block, is it right?
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12564058/ZOOKEEPER-1324.patch
        against trunk revision 1427034.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1328//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1328//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1328//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12564058/ZOOKEEPER-1324.patch against trunk revision 1427034. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1328//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1328//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1328//console This message is automatically generated.
        Hide
        thawan Thawan Kooburat added a comment -

        synchronization of lastCommitted is based on findbugs warning.

        My understanding is that leader object is used to lock the entire request processing pipeline. So it might be safer to do synchronized the entire method in order to guarantee that lastCommitted and DB's zxid is set atomically when the server start (and the system start processing request).

        Show
        thawan Thawan Kooburat added a comment - synchronization of lastCommitted is based on findbugs warning. My understanding is that leader object is used to lock the entire request processing pipeline. So it might be safer to do synchronized the entire method in order to guarantee that lastCommitted and DB's zxid is set atomically when the server start (and the system start processing request).
        Hide
        thawan Thawan Kooburat added a comment -

        Fix synchronization and print list of supporting followers when quorum is formed

        Show
        thawan Thawan Kooburat added a comment - Fix synchronization and print list of supporting followers when quorum is formed
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12564118/ZOOKEEPER-1324.patch
        against trunk revision 1427034.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1329//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1329//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1329//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12564118/ZOOKEEPER-1324.patch against trunk revision 1427034. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1329//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1329//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1329//console This message is automatically generated.
        Hide
        fpj Flavio Junqueira added a comment -

        Thanks, Thawan. +1, it looks good to me.

        Show
        fpj Flavio Junqueira added a comment - Thanks, Thawan. +1, it looks good to me.
        Hide
        fpj Flavio Junqueira added a comment -

        Committed revision 1433651.

        Show
        fpj Flavio Junqueira added a comment - Committed revision 1433651.
        Hide
        hudson Hudson added a comment -

        Integrated in ZooKeeper-trunk #1804 (See https://builds.apache.org/job/ZooKeeper-trunk/1804/)
        ZOOKEEPER-1324. Remove Duplicate NEWLEADER packets from the Leader to the Follower. (thawan via fpj) (Revision 1433651)

        Result = SUCCESS
        fpj : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1433651
        Files :

        • /zookeeper/trunk/CHANGES.txt
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
        • /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java
        Show
        hudson Hudson added a comment - Integrated in ZooKeeper-trunk #1804 (See https://builds.apache.org/job/ZooKeeper-trunk/1804/ ) ZOOKEEPER-1324 . Remove Duplicate NEWLEADER packets from the Leader to the Follower. (thawan via fpj) (Revision 1433651) Result = SUCCESS fpj : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1433651 Files : /zookeeper/trunk/CHANGES.txt /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java
        Hide
        shralex Alexander Shraer added a comment -

        Hi Thawan, Flavio and Patrick.

        Unfortunately, this patch (committed yesterday) creates a big problem for me. In order to merge ZK-107 with it I basically need to re-implement ZK-1324, and I don't feel I'm the expert on ZK-1324. Since ZK-107 is a much much bigger patch, and since it is ready to be committed for months now, I suggest to roll back ZK-1324, commit ZK-107 and have ZK-1324 re-implemented on top by someone who understand what this patch is about. I'm really sorry about this Thawan, its not your fault, and I can help re-implementing it, I just think that playing continuous catchup having me re-implement such subtle code committed by others like in ZK-1324 is a very bad idea...

        Best Regards,
        Alex

        Show
        shralex Alexander Shraer added a comment - Hi Thawan, Flavio and Patrick. Unfortunately, this patch (committed yesterday) creates a big problem for me. In order to merge ZK-107 with it I basically need to re-implement ZK-1324, and I don't feel I'm the expert on ZK-1324. Since ZK-107 is a much much bigger patch, and since it is ready to be committed for months now, I suggest to roll back ZK-1324, commit ZK-107 and have ZK-1324 re-implemented on top by someone who understand what this patch is about. I'm really sorry about this Thawan, its not your fault, and I can help re-implementing it, I just think that playing continuous catchup having me re-implement such subtle code committed by others like in ZK-1324 is a very bad idea... Best Regards, Alex
        Hide
        fpj Flavio Junqueira added a comment - - edited

        Hi Alex,

        This is indeed kind of unfortunate. I gave it a number of days between +1'ing it and committing so that people could also review and comment. No one did, so I committed it. In general, I agree that you don't have to understand what every single patch that touches zk core does in detail, but it would be great if you could scan the list for issues that could affect ZOOKEEPER-107 and flag it by linking the jiras. There are other jiras coming that could affect ZOOKEEPER-107, like ZOOKEEPER-1549, ZOOKEEPER-1551, ZOOKEEPER-1552, so we need to keep it in mind. We haven't been doing a good job with marking the components that jiras affect, so I'll try to be more careful about that.

        I'd rather not revert this patch, but I'd say that we can help you port the patch. If Thawan doesn't have the time, I'll help you out.

        Show
        fpj Flavio Junqueira added a comment - - edited Hi Alex, This is indeed kind of unfortunate. I gave it a number of days between +1'ing it and committing so that people could also review and comment. No one did, so I committed it. In general, I agree that you don't have to understand what every single patch that touches zk core does in detail, but it would be great if you could scan the list for issues that could affect ZOOKEEPER-107 and flag it by linking the jiras. There are other jiras coming that could affect ZOOKEEPER-107 , like ZOOKEEPER-1549 , ZOOKEEPER-1551 , ZOOKEEPER-1552 , so we need to keep it in mind. We haven't been doing a good job with marking the components that jiras affect, so I'll try to be more careful about that. I'd rather not revert this patch, but I'd say that we can help you port the patch. If Thawan doesn't have the time, I'll help you out.
        Hide
        shralex Alexander Shraer added a comment -

        Hi Flavio,

        You're right, I should have paid closer attention. But I actually did notice this JIRA and wrote Patrick about two weeks ago that I'm concerned about several JIRAs touching the same code as ZK-107 and that we should commit ZK-107 ASAP. At some point I think we need a lock/freeze on significant other changes to commit ZK-107, otherwise its an endless catch-up.

        Regarding this specific patch. I'm not sure what it means to "port the patch". The code in Leader.java and a few other files that is changed by ZK-1324 was restructured and then modified in subtle ways ZK-107. I really believe that the easiest and cleanest way to merge is to have ZK-107 as a starting point and implement ZK-1324 relative to it (on top of it), and not the other way. It is of course possible that Thawan or you do that on your local machine (you can use the latest ZK-107 patch) but then the re-implementation of ZK-1324 will be part of ZK-107. I think this is not a good solution, since ZK-1324 will be spread across two JIRAs and it will be difficult to find problems later In both these JIRAs.

        Alex

        Show
        shralex Alexander Shraer added a comment - Hi Flavio, You're right, I should have paid closer attention. But I actually did notice this JIRA and wrote Patrick about two weeks ago that I'm concerned about several JIRAs touching the same code as ZK-107 and that we should commit ZK-107 ASAP. At some point I think we need a lock/freeze on significant other changes to commit ZK-107, otherwise its an endless catch-up. Regarding this specific patch. I'm not sure what it means to "port the patch". The code in Leader.java and a few other files that is changed by ZK-1324 was restructured and then modified in subtle ways ZK-107. I really believe that the easiest and cleanest way to merge is to have ZK-107 as a starting point and implement ZK-1324 relative to it (on top of it), and not the other way. It is of course possible that Thawan or you do that on your local machine (you can use the latest ZK-107 patch) but then the re-implementation of ZK-1324 will be part of ZK-107. I think this is not a good solution, since ZK-1324 will be spread across two JIRAs and it will be difficult to find problems later In both these JIRAs. Alex
        Hide
        fpj Flavio Junqueira added a comment -

        You should have written on the jira directly instead of going through a private channel with Pat. I'm of course aware of your effort, but it is easy to forget when it is not marked explicitly in the jira. Let me also say that if we had created a 107 branch as I suggested a while back, we could be marking jiras to produce versions of the patches for that branch.

        As for what to do now, I'm not sure what the best course of action is. I don't like the idea of reverting the commit, but at the same time I feel bad about giving you more work to do. I'd like to hear what the others involved in this discussion have to say about it.

        Show
        fpj Flavio Junqueira added a comment - You should have written on the jira directly instead of going through a private channel with Pat. I'm of course aware of your effort, but it is easy to forget when it is not marked explicitly in the jira. Let me also say that if we had created a 107 branch as I suggested a while back, we could be marking jiras to produce versions of the patches for that branch. As for what to do now, I'm not sure what the best course of action is. I don't like the idea of reverting the commit, but at the same time I feel bad about giving you more work to do. I'd like to hear what the others involved in this discussion have to say about it.
        Hide
        thawan Thawan Kooburat added a comment -

        I skim through ZK-107 and I think the core logic that this patch try to fix doesn't conflict with ZK-107, but I guess it would need a rewrite. I am fine with any approach, we can also move this patch down to 3.4 and recreate a jira/patch to fix this again in 3.5

        Show
        thawan Thawan Kooburat added a comment - I skim through ZK-107 and I think the core logic that this patch try to fix doesn't conflict with ZK-107, but I guess it would need a rewrite. I am fine with any approach, we can also move this patch down to 3.4 and recreate a jira/patch to fix this again in 3.5
        Hide
        phunt Patrick Hunt added a comment -

        Alex (et. al.) - how close is 107 to being ready for commit? If it's still a ways off then reverting this one patch isn't going to help much in the grander scheme of things.

        Thawan/Flavio - If we revert 1324 then apply 107 how much work is it to get 1324 patch working again? Based on Thawan's comments it doesn't seem like it's too big an issue for you to do.

        Show
        phunt Patrick Hunt added a comment - Alex (et. al.) - how close is 107 to being ready for commit? If it's still a ways off then reverting this one patch isn't going to help much in the grander scheme of things. Thawan/Flavio - If we revert 1324 then apply 107 how much work is it to get 1324 patch working again? Based on Thawan's comments it doesn't seem like it's too big an issue for you to do.
        Hide
        shralex Alexander Shraer added a comment -

        Hi Patrick, IMHO 107 is pretty much ready for commit. Michi is working on the missing C tests, and I'm meeting him this weekend to try to finish this together. So hopefully next week its going to be completely ready. We also need to commit ZK-1620 so that my tests pass.

        Show
        shralex Alexander Shraer added a comment - Hi Patrick, IMHO 107 is pretty much ready for commit. Michi is working on the missing C tests, and I'm meeting him this weekend to try to finish this together. So hopefully next week its going to be completely ready. We also need to commit ZK-1620 so that my tests pass.
        Hide
        fpj Flavio Junqueira added a comment -

        Just to review what I'm reading here:

        1. No one is against reverting and Alex strongly prefers reverting, so we'll do it, yes?
        2. This jira is not marked for 3.4.6, but Thawan mentions the 3.4 branch in his comment. Should the latest patch be committed to 3.4.6? It sounds like it should, so I'm marking it, but please let me know if you disagree.
        3. We will work a different patch for 3.5.0 based on ZOOKEEPER-107.

        If there is any concern about these points, please comment.

        Show
        fpj Flavio Junqueira added a comment - Just to review what I'm reading here: No one is against reverting and Alex strongly prefers reverting, so we'll do it, yes? This jira is not marked for 3.4.6, but Thawan mentions the 3.4 branch in his comment. Should the latest patch be committed to 3.4.6? It sounds like it should, so I'm marking it, but please let me know if you disagree. We will work a different patch for 3.5.0 based on ZOOKEEPER-107 . If there is any concern about these points, please comment.
        Hide
        fpj Flavio Junqueira added a comment -

        Reverted previous commit (Committed revision 1434978.). Need to provide a patch on top of ZOOKEEPER-107 and commit current patch to branch 3.4.

        Show
        fpj Flavio Junqueira added a comment - Reverted previous commit (Committed revision 1434978.). Need to provide a patch on top of ZOOKEEPER-107 and commit current patch to branch 3.4.
        Hide
        fpj Flavio Junqueira added a comment -

        Committed to branch 3.4. Committed revision 1435159.

        Show
        fpj Flavio Junqueira added a comment - Committed to branch 3.4. Committed revision 1435159.
        Hide
        fpj Flavio Junqueira added a comment -

        Thawan Kooburat Should we wrap up this jira? I've checked that there are conflicts, so the latest patch has gone stale. Do you have time to generate a patch for trunk or you want me to do it?

        Show
        fpj Flavio Junqueira added a comment - Thawan Kooburat Should we wrap up this jira? I've checked that there are conflicts, so the latest patch has gone stale. Do you have time to generate a patch for trunk or you want me to do it?
        Hide
        abranzyck Germán Blanco added a comment -

        This patch skips adding the NEWLEADER quorum packet in Leader.startForwarding, since it will be added in LearnerHandler a few lines below the call to Leader.startForwarding.

        Show
        abranzyck Germán Blanco added a comment - This patch skips adding the NEWLEADER quorum packet in Leader.startForwarding, since it will be added in LearnerHandler a few lines below the call to Leader.startForwarding.
        Hide
        abranzyck Germán Blanco added a comment -

        The Zab1.0 tests are modified, since they checked for the NEWLEADER packet to be duplicated.

        Show
        abranzyck Germán Blanco added a comment - The Zab1.0 tests are modified, since they checked for the NEWLEADER packet to be duplicated.
        Hide
        abranzyck Germán Blanco added a comment -

        I hope submitting a proposal patch for a JIRA assigned to someone else is no problem. I prepared this for the duplicate JIRA ZOOKEEPER-1694.

        Show
        abranzyck Germán Blanco added a comment - I hope submitting a proposal patch for a JIRA assigned to someone else is no problem. I prepared this for the duplicate JIRA ZOOKEEPER-1694 .
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12579889/ZOOKEEPER-1324.patch
        against trunk revision 1463329.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1464//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1464//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1464//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12579889/ZOOKEEPER-1324.patch against trunk revision 1463329. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1464//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1464//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1464//console This message is automatically generated.
        Hide
        fpj Flavio Junqueira added a comment -

        German, In the other jira, I suggested that you had a look at what has been done so far in this jira and if you had any concern with the patch that has been proposed, that you please raise it. Could you please have a look at the latest patch proposed before yours add comments to this jira? If you make any additions, please do it on top of the patch that has been proposed and reviewed. In the case you make any modification, please explain it in the jira.

        In the case you haven't done it yet, I suggest you have a look at the "how to contribute" page:

        https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute

        Show
        fpj Flavio Junqueira added a comment - German, In the other jira, I suggested that you had a look at what has been done so far in this jira and if you had any concern with the patch that has been proposed, that you please raise it. Could you please have a look at the latest patch proposed before yours add comments to this jira? If you make any additions, please do it on top of the patch that has been proposed and reviewed. In the case you make any modification, please explain it in the jira. In the case you haven't done it yet, I suggest you have a look at the "how to contribute" page: https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute
        Hide
        abranzyck Germán Blanco added a comment -

        OK, I see now that you meant the patch on the 3.4 branch.
        I have read the "how to contribute" page, but I will do it again more carefully.

        Show
        abranzyck Germán Blanco added a comment - OK, I see now that you meant the patch on the 3.4 branch. I have read the "how to contribute" page, but I will do it again more carefully.
        Hide
        fpj Flavio Junqueira added a comment -

        Here is a preliminary patch for trunk. I have tried to port the latest one from Thawan Kooburat, but the changes due to ZOOKEEPER-107 made it non-trivial. I'd really like perhaps Alexander Shraer to also have a look at it to make sure I'm not breaking anything badly for the reconfiguration mechanism.

        One test failed for me: ReconfigRecoveryTest. I haven't had a chance to look too closely into it, though. If anyone has an idea, I'd like to hear, otherwise I'll have a look at it again later on.

        Show
        fpj Flavio Junqueira added a comment - Here is a preliminary patch for trunk. I have tried to port the latest one from Thawan Kooburat , but the changes due to ZOOKEEPER-107 made it non-trivial. I'd really like perhaps Alexander Shraer to also have a look at it to make sure I'm not breaking anything badly for the reconfiguration mechanism. One test failed for me: ReconfigRecoveryTest. I haven't had a chance to look too closely into it, though. If anyone has an idea, I'd like to hear, otherwise I'll have a look at it again later on.
        Hide
        shralex Alexander Shraer added a comment -

        Hi Flavio,

        Sure, I can definitely go over the patch

        Show
        shralex Alexander Shraer added a comment - Hi Flavio, Sure, I can definitely go over the patch
        Hide
        marshall Marshall McMullen added a comment -

        Alex, have you had a chance to go over this patch yet? We're seeing this problem in our production environment and would love to get this addressed as soon as possible.

        Thanks!

        Show
        marshall Marshall McMullen added a comment - Alex, have you had a chance to go over this patch yet? We're seeing this problem in our production environment and would love to get this addressed as soon as possible. Thanks!
        Hide
        shralex Alexander Shraer added a comment -

        Hi Flavio,

        All the reconfiguration-related code that was executed when a NEWLEADER message gets enough acks is now omitted in the patch (previously this was in Leader.java, tryToCommit(), the "else" clause of "if ((zxid & 0xffffffffL) != 0)"). This is important for reconfiguration recovery - the leader discovers a new configuration during recovery and must apply it by executing this code. Its quite possible that this causes the reconfiguration recovery tests to fail.

        There is a related new comment about this "if" in processAck() that I didn't fully understand.

        Thanks,
        Alex

        Show
        shralex Alexander Shraer added a comment - Hi Flavio, All the reconfiguration-related code that was executed when a NEWLEADER message gets enough acks is now omitted in the patch (previously this was in Leader.java, tryToCommit(), the "else" clause of "if ((zxid & 0xffffffffL) != 0)"). This is important for reconfiguration recovery - the leader discovers a new configuration during recovery and must apply it by executing this code. Its quite possible that this causes the reconfiguration recovery tests to fail. There is a related new comment about this "if" in processAck() that I didn't fully understand. Thanks, Alex
        Hide
        abranzyck Germán Blanco added a comment -

        Hello all,
        I hope I don't bother you too much with this comment, given that I don't actually understand most of the code in the current proposed patch.
        It seems to me that this is a problem in 3.4 and it should be corrected with as few changes as possible. Other improvements should go to 3.5. And it seems to me that the problem is that NEWLEADER gets both inserted in the outstandingProposals and sent to the follower directly. So apparently, this could be solved with the very small change bellow. It just skips the inserting of NEWLEADER in the outstanding proposals and keeps all the current processing. Line numbers might not be exact, since it was done for trunk a few weeks ago.
        I could propose it on top of the current patch, but it would mean removing all of it (except where it changes the test case), without really understanding what it is doing, and adding these few lines.
        Could anybody, maybe Flavio, please explain where I get this problem wrong?
        Best regards,
        Germán Blanco.
        — src/java/main/org/apache/zookeeper/server/quorum/Leader.java (revision 1470391)
        +++ src/java/main/org/apache/zookeeper/server/quorum/Leader.java (working copy)
        @@ -1116,10 +1116,11 @@
        List<Long>zxids = new ArrayList<Long>(outstandingProposals.keySet());
        Collections.sort(zxids);
        for (Long zxid: zxids) {

        • if (zxid <= lastSeenZxid) {
          + QuorumPacket qp = outstandingProposals.get(zxid).packet;
          + if ((zxid <= lastSeenZxid) || (qp.getType() == Leader.NEWLEADER)) { continue; }
        • handler.queuePacket(outstandingProposals.get(zxid).packet);
          + handler.queuePacket(qp);
          }
          }
          if (handler.getLearnerType() == LearnerType.PARTICIPANT) {
        Show
        abranzyck Germán Blanco added a comment - Hello all, I hope I don't bother you too much with this comment, given that I don't actually understand most of the code in the current proposed patch. It seems to me that this is a problem in 3.4 and it should be corrected with as few changes as possible. Other improvements should go to 3.5. And it seems to me that the problem is that NEWLEADER gets both inserted in the outstandingProposals and sent to the follower directly. So apparently, this could be solved with the very small change bellow. It just skips the inserting of NEWLEADER in the outstanding proposals and keeps all the current processing. Line numbers might not be exact, since it was done for trunk a few weeks ago. I could propose it on top of the current patch, but it would mean removing all of it (except where it changes the test case), without really understanding what it is doing, and adding these few lines. Could anybody, maybe Flavio, please explain where I get this problem wrong? Best regards, Germán Blanco. — src/java/main/org/apache/zookeeper/server/quorum/Leader.java (revision 1470391) +++ src/java/main/org/apache/zookeeper/server/quorum/Leader.java (working copy) @@ -1116,10 +1116,11 @@ List<Long>zxids = new ArrayList<Long>(outstandingProposals.keySet()); Collections.sort(zxids); for (Long zxid: zxids) { if (zxid <= lastSeenZxid) { + QuorumPacket qp = outstandingProposals.get(zxid).packet; + if ((zxid <= lastSeenZxid) || (qp.getType() == Leader.NEWLEADER)) { continue; } handler.queuePacket(outstandingProposals.get(zxid).packet); + handler.queuePacket(qp); } } if (handler.getLearnerType() == LearnerType.PARTICIPANT) {
        Hide
        fpj Flavio Junqueira added a comment -

        German, This issue has already been resolved for branch 3.4, check this out:

        https://issues.apache.org/jira/browse/ZOOKEEPER-1324?focusedCommentId=13557243&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13557243

        We are now working on the patch for trunk.

        Show
        fpj Flavio Junqueira added a comment - German, This issue has already been resolved for branch 3.4, check this out: https://issues.apache.org/jira/browse/ZOOKEEPER-1324?focusedCommentId=13557243&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13557243 We are now working on the patch for trunk.
        Hide
        fpj Flavio Junqueira added a comment -

        Alexander Shraer Thanks a lot for having a look at the new patch. Part of this patch was moving a bit of code around. In the original patch prior to ZOOKEEPER-107 getting in, we were removing this "if ((zxid & 0xffffffffL) != 0)" because were not calling tryToCommit for the newleader proposal any longer. When looking at the trunk code, it seemed to me that you had some duplicate code in the else block because you needed to have it executed for both branches (if ((zxid & 0xffffffffL) != 0) succeeds or not). But, because we removed this check "if ((zxid & 0xffffffffL) != 0)", we don't have to duplicate that piece of code any longer, and it is executed in the case that the request a reconfig one.

        For completeness, the duplicated bit of code I'm referring to is this:

        QuorumVerifier newQV = self.getLastSeenQuorumVerifier();
                       
        Long designatedLeader = getDesignatedLeader(p, zxid);                                         
                       
        self.processReconfig(newQV, designatedLeader, zk.getZxid(), true);
        if (designatedLeader != self.getId()) {
                allowedToCommit = false;
        }
        

        One bit that I just realized while writing this post is that the else block in the current trunk code is executed even if the request is not a reconfig request. Is this correct? Note that we don't process the newleader proposal in this method any longer.

        The other thing I was thinking is that if it is really incorrect, then the tests should have captured. Assuming that I need to fix the patch, it would be nice to determine why the reconfig tests didn't capture it and fix the tests.

        Show
        fpj Flavio Junqueira added a comment - Alexander Shraer Thanks a lot for having a look at the new patch. Part of this patch was moving a bit of code around. In the original patch prior to ZOOKEEPER-107 getting in, we were removing this "if ((zxid & 0xffffffffL) != 0)" because were not calling tryToCommit for the newleader proposal any longer. When looking at the trunk code, it seemed to me that you had some duplicate code in the else block because you needed to have it executed for both branches (if ((zxid & 0xffffffffL) != 0) succeeds or not). But, because we removed this check "if ((zxid & 0xffffffffL) != 0)", we don't have to duplicate that piece of code any longer, and it is executed in the case that the request a reconfig one. For completeness, the duplicated bit of code I'm referring to is this: QuorumVerifier newQV = self.getLastSeenQuorumVerifier(); Long designatedLeader = getDesignatedLeader(p, zxid); self.processReconfig(newQV, designatedLeader, zk.getZxid(), true ); if (designatedLeader != self.getId()) { allowedToCommit = false ; } One bit that I just realized while writing this post is that the else block in the current trunk code is executed even if the request is not a reconfig request. Is this correct? Note that we don't process the newleader proposal in this method any longer. The other thing I was thinking is that if it is really incorrect, then the tests should have captured. Assuming that I need to fix the patch, it would be nice to determine why the reconfig tests didn't capture it and fix the tests.
        Hide
        shralex Alexander Shraer added a comment -

        Hi Flavio,

        no problem! yes, this is the code I'm referring to. Right, this will run if the packet is not
        reconfig (the "else" clause), specifically when the leader finds out an outstanding reconfig
        it must complete during recovery.

        A recap of how this part of the recovery process works - the leader sends the new config it must complete to others inside a NEWLEADER message (see LearnerHandler where the NEWLEADER message is constructed), and once it has enough acks this code you quote is executed so that it applies the config to itself. The NEWLEADER acts as a reconfig proposal, and the UPTODATE message acts as a reconfig "commit". So - this code must still be executed somewhere in a similar situation as before (i.e., where you have enough acks for NEWLEADER). Actually, I totally missed the fact that startForwarding sends out the outstandingProposals, so for example I'm not putting the configuration in the NEWLEADER message which was inserted into the outstandingProposals queue in Leader.java - I'm guessing that this duplicate NEWLEADER message is a problem for reconfig recovery too!

        BTW, could you please address the other part of German's comment ? Why wouldn't his proposed change solve the issue (skip the NEWLEADER message in the startForwarding() where you send all messages in the outstaningProposals) ? I think if you explain it'll help us understand better what you guys are doing in this patch

        Thanks,
        Alex

        Show
        shralex Alexander Shraer added a comment - Hi Flavio, no problem! yes, this is the code I'm referring to. Right, this will run if the packet is not reconfig (the "else" clause), specifically when the leader finds out an outstanding reconfig it must complete during recovery. A recap of how this part of the recovery process works - the leader sends the new config it must complete to others inside a NEWLEADER message (see LearnerHandler where the NEWLEADER message is constructed), and once it has enough acks this code you quote is executed so that it applies the config to itself. The NEWLEADER acts as a reconfig proposal, and the UPTODATE message acts as a reconfig "commit". So - this code must still be executed somewhere in a similar situation as before (i.e., where you have enough acks for NEWLEADER). Actually, I totally missed the fact that startForwarding sends out the outstandingProposals, so for example I'm not putting the configuration in the NEWLEADER message which was inserted into the outstandingProposals queue in Leader.java - I'm guessing that this duplicate NEWLEADER message is a problem for reconfig recovery too! BTW, could you please address the other part of German's comment ? Why wouldn't his proposed change solve the issue (skip the NEWLEADER message in the startForwarding() where you send all messages in the outstaningProposals) ? I think if you explain it'll help us understand better what you guys are doing in this patch Thanks, Alex
        Show
        fpj Flavio Junqueira added a comment - It sounds like German's approach is pretty much the same I had proposed previously in ZOOKEEPER-1319 . You can also check this comment in this same jira: https://issues.apache.org/jira/browse/ZOOKEEPER-1324?focusedCommentId=13543365&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13543365 Thawan Kooburat gives a good summary of the changes proposed here: https://issues.apache.org/jira/browse/ZOOKEEPER-1324?focusedCommentId=13547301&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13547301 and here: https://issues.apache.org/jira/browse/ZOOKEEPER-1324?focusedCommentId=13549144&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13549144
        Hide
        fpj Flavio Junqueira added a comment -

        Alexander Shraer Most of the else block has been moved to startZkServer. I have not moved the reconfig part, though, but from what I'm getting in your comments, it sounds like we should.

        I also just remembered that one of the reconfig tests failed, so perhaps this is the cause.

        Show
        fpj Flavio Junqueira added a comment - Alexander Shraer Most of the else block has been moved to startZkServer. I have not moved the reconfig part, though, but from what I'm getting in your comments, it sounds like we should. I also just remembered that one of the reconfig tests failed, so perhaps this is the cause.
        Hide
        shralex Alexander Shraer added a comment -

        Yeah, I think German is proposing the same thing as in your link, and it seems that its a one-line solution to the duplicate NEWLEADER problem.

        A few thoughts about Thawan's comments:

        I agree that it would be better not to send outstandingProposals to observers and not to ack UPTODATE messages, but these seem like separate issues from this JIRA's purpose.

        • the following is already done in the trunk (part of ZK-107):

        >1. This change also prevent the leader from counting Observer's NEWLEADER_ACK causing the
        > quorum to form prematurely. I will submit a unit test in a separate jira

        • I don't fully understand the following comment, don't think it works this way:

        > 2. We are sending NEWLEADER proposal implicitly as part of sending outstanding proposals to
        > the learners. Now you have to keep adding NEWLEADER back into outstanding proposals whenever > a new learner joins the quorum since it is removed whenever NEWLEADER packet is committed.

        If I understand correctly, when a new learner joins an already established leader we don't need to commit a NEWLEADER message, so no need to put it in outstanding proposals.

        So it seems like the main remaining motivation for this patch is decoupling the ACK processing for NEWLEADER from the ACK processing of other messages.

        Show
        shralex Alexander Shraer added a comment - Yeah, I think German is proposing the same thing as in your link, and it seems that its a one-line solution to the duplicate NEWLEADER problem. A few thoughts about Thawan's comments: I agree that it would be better not to send outstandingProposals to observers and not to ack UPTODATE messages, but these seem like separate issues from this JIRA's purpose. the following is already done in the trunk (part of ZK-107): >1. This change also prevent the leader from counting Observer's NEWLEADER_ACK causing the > quorum to form prematurely. I will submit a unit test in a separate jira I don't fully understand the following comment, don't think it works this way: > 2. We are sending NEWLEADER proposal implicitly as part of sending outstanding proposals to > the learners. Now you have to keep adding NEWLEADER back into outstanding proposals whenever > a new learner joins the quorum since it is removed whenever NEWLEADER packet is committed. If I understand correctly, when a new learner joins an already established leader we don't need to commit a NEWLEADER message, so no need to put it in outstanding proposals. So it seems like the main remaining motivation for this patch is decoupling the ACK processing for NEWLEADER from the ACK processing of other messages.
        Hide
        fpj Flavio Junqueira added a comment -

        I agree that it would be better not to send outstandingProposals to observers and not to ack UPTODATE messages, but these seem like separate issues from this JIRA's purpose.

        Sure, it's a good observation, but I think we ended up not doing anything about it, at least I can't see anything related to it in the patch.

        The following is already done in the trunk (part of ZK-107): 1. This change also prevent the leader from counting Observer's NEWLEADER_ACK causing the quorum to form prematurely. I will submit a unit test in a separate jira

        This seems to be part of the merge, it would be good if you could check that it is not conflicting with what was done in ZOOKEEPER-107.

        We are sending NEWLEADER proposal implicitly as part of sending outstanding proposals to the learners. Now you have to keep adding NEWLEADER back into outstanding proposals whenever a new learner joins the quorum since it is removed whenever NEWLEADER packet is committed.

        There was probably some confusion here, I don't think there is anything in the patch trying to do things this way. We simply remove the line that adds newleader to outstandingProposals.

        So it seems like the main remaining motivation for this patch is decoupling the ACK processing for NEWLEADER from the ACK processing of other messages.

        This is the main motivation for me.

        I'll produce a new patch shortly.

        Show
        fpj Flavio Junqueira added a comment - I agree that it would be better not to send outstandingProposals to observers and not to ack UPTODATE messages, but these seem like separate issues from this JIRA's purpose. Sure, it's a good observation, but I think we ended up not doing anything about it, at least I can't see anything related to it in the patch. The following is already done in the trunk (part of ZK-107): 1. This change also prevent the leader from counting Observer's NEWLEADER_ACK causing the quorum to form prematurely. I will submit a unit test in a separate jira This seems to be part of the merge, it would be good if you could check that it is not conflicting with what was done in ZOOKEEPER-107 . We are sending NEWLEADER proposal implicitly as part of sending outstanding proposals to the learners. Now you have to keep adding NEWLEADER back into outstanding proposals whenever a new learner joins the quorum since it is removed whenever NEWLEADER packet is committed. There was probably some confusion here, I don't think there is anything in the patch trying to do things this way. We simply remove the line that adds newleader to outstandingProposals. So it seems like the main remaining motivation for this patch is decoupling the ACK processing for NEWLEADER from the ACK processing of other messages. This is the main motivation for me. I'll produce a new patch shortly.
        Hide
        shralex Alexander Shraer added a comment -

        Regarding the membership check - I think this is done here in the patch:

        if (learnerType == LearnerType.PARTICIPANT)

        { newLeaderProposal.addAck(sid); }

        the implementation of addAck already checks that sid is a voting member of each required config of newLeaderProposal before counting the ack for that config. So the extra check above is not necessary (although it doesn't hurt).

        Show
        shralex Alexander Shraer added a comment - Regarding the membership check - I think this is done here in the patch: if (learnerType == LearnerType.PARTICIPANT) { newLeaderProposal.addAck(sid); } the implementation of addAck already checks that sid is a voting member of each required config of newLeaderProposal before counting the ack for that config. So the extra check above is not necessary (although it doesn't hurt).
        Hide
        thawan Thawan Kooburat added a comment -

        We are sending NEWLEADER proposal implicitly as part of sending outstanding proposals to the learners. Now you have to keep adding NEWLEADER back into outstanding proposals whenever a new learner joins the quorum since it is removed whenever NEWLEADER packet is committed.

        This is just a response to the initial attempt to fix this bug in this patch (https://issues.apache.org/jira/secure/attachment/12563751/ZOOKEEPER-1324.patch) where we try to reuse processAck to collect votes for NEWLEADER_ACK.

        Comments about the latest patch:
        1. in 3.4 version of this patch, we have an if statement to ignore UPDATE_ACK ((zxid & 0xffffffffL) == 0) in processAck(). Looking at the code, the learner from both 3.4 and 3.5 still sending UPTODATE_ACK, so I think we still need to have that similar logic unless it is needed by reconfiguration.

        2. (Minor) in ackSetsToString(), can you change the delimiter from ":" to "," to match 3.4 branch?

        Show
        thawan Thawan Kooburat added a comment - We are sending NEWLEADER proposal implicitly as part of sending outstanding proposals to the learners. Now you have to keep adding NEWLEADER back into outstanding proposals whenever a new learner joins the quorum since it is removed whenever NEWLEADER packet is committed. This is just a response to the initial attempt to fix this bug in this patch ( https://issues.apache.org/jira/secure/attachment/12563751/ZOOKEEPER-1324.patch ) where we try to reuse processAck to collect votes for NEWLEADER_ACK. Comments about the latest patch: 1. in 3.4 version of this patch, we have an if statement to ignore UPDATE_ACK ((zxid & 0xffffffffL) == 0) in processAck(). Looking at the code, the learner from both 3.4 and 3.5 still sending UPTODATE_ACK, so I think we still need to have that similar logic unless it is needed by reconfiguration. 2. (Minor) in ackSetsToString(), can you change the delimiter from ":" to "," to match 3.4 branch?
        Hide
        fpj Flavio Junqueira added a comment - - edited

        Regarding the membership check - I think this is done here in the patch

        I have fixed it in the new patch I uploaded. When you mentioned a check for observers, I thought it was related to outstanding proposals. The one you point out is in the processing of acks, right?

        In any case, have a look at the new patch, please. Special attention to Leader#startZkServer().

        All tests pass with this patch for me.

        Show
        fpj Flavio Junqueira added a comment - - edited Regarding the membership check - I think this is done here in the patch I have fixed it in the new patch I uploaded. When you mentioned a check for observers, I thought it was related to outstanding proposals. The one you point out is in the processing of acks, right? In any case, have a look at the new patch, please. Special attention to Leader#startZkServer(). All tests pass with this patch for me.
        Hide
        shralex Alexander Shraer added a comment -

        One case where this "if" actually changed behavior is when sid is switching from being Observer to being Participant. The "if" would see that sid is an Ovserver in the current config and prevent sid from counting towards the new-config too, where sid is actually a Participant.

        I'm not 100% sure what the right thing to do here. On one hand, acking a NEWLEADER message means that state transfer is complete, Observer or not, so why not count this guy's vote. On the other hand, he is not yet a participant, and in the normal (non-recovery) case we wouldn't count its vote (he wouldn't send any).

        My intuition is that both options are safe in this case since the ACK means state transfer complete, and for liveness its better to allow it to vote, i.e., to remove the if as Flavio now did.

        Show
        shralex Alexander Shraer added a comment - One case where this "if" actually changed behavior is when sid is switching from being Observer to being Participant. The "if" would see that sid is an Ovserver in the current config and prevent sid from counting towards the new-config too, where sid is actually a Participant. I'm not 100% sure what the right thing to do here. On one hand, acking a NEWLEADER message means that state transfer is complete, Observer or not, so why not count this guy's vote. On the other hand, he is not yet a participant, and in the normal (non-recovery) case we wouldn't count its vote (he wouldn't send any). My intuition is that both options are safe in this case since the ACK means state transfer complete, and for liveness its better to allow it to vote, i.e., to remove the if as Flavio now did.
        Hide
        shralex Alexander Shraer added a comment -

        Thawan, I think that (zxid & 0xffffffffL) == 0 means its a NEWLEADER ack, not UPTODATE ack.
        See how the zxid of NEWLEADER message is chosen in leader.lead()

        Show
        shralex Alexander Shraer added a comment - Thawan, I think that (zxid & 0xffffffffL) == 0 means its a NEWLEADER ack, not UPTODATE ack. See how the zxid of NEWLEADER message is chosen in leader.lead()
        Hide
        thawan Thawan Kooburat added a comment -

        Both NEWLEADER_ACK and UPTODATE_ACK is using the same zxid. See Learner.java line 322,476-477 (http://svn.apache.org/viewvc/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Learner.java?revision=1453693&view=markup). I think the new code should check of package type instead of checking zxid. It will make the code clearer.

        Show
        thawan Thawan Kooburat added a comment - Both NEWLEADER_ACK and UPTODATE_ACK is using the same zxid. See Learner.java line 322,476-477 ( http://svn.apache.org/viewvc/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Learner.java?revision=1453693&view=markup ). I think the new code should check of package type instead of checking zxid. It will make the code clearer.
        Hide
        shralex Alexander Shraer added a comment -

        I see. I wonder if it'll break anything not to send the UPTODATE ACK at all. I don't think its needed by the protocol.

        Show
        shralex Alexander Shraer added a comment - I see. I wonder if it'll break anything not to send the UPTODATE ACK at all. I don't think its needed by the protocol.
        Hide
        fpj Flavio Junqueira added a comment -

        Thawan Kooburat

        in 3.4 version of this patch, we have an if statement to ignore UPDATE_ACK ((zxid & 0xffffffffL) == 0) in processAck(). Looking at the code, the learner from both 3.4 and 3.5 still sending UPTODATE_ACK, so I think we still need to have that similar logic unless it is needed by reconfiguration.

        We still have that if statement. The one I was talking about previously that has been removed is in Leader#tryToCommit().

        Alexander Shraer

        acking a NEWLEADER message means that state transfer is complete, Observer or not, so why not count this guy's vote. On the other hand, he is not yet a participant, and in the normal (non-recovery) case we wouldn't count its vote (he wouldn't send any).

        In the case you describe, I thought that the observer (changing to participant) should actually be able to vote because we also need a quorum for the new config, which may also come in a newleader message, no? If this is correct, then we shouldn't have the if statement.

        I think that (zxid & 0xffffffffL) == 0 means its a NEWLEADER ack, not UPTODATE ack.

        You're both right. We send a second ack after the uptodate, and it uses the same zxid. It is there for backward compatibility reasons, the ack itself is strictly necessary for the correctness of the protocol.

        Once we decide what to do with the if statement Alex referred to, I'll make any necessary changes and upload a new patch.

        Show
        fpj Flavio Junqueira added a comment - Thawan Kooburat in 3.4 version of this patch, we have an if statement to ignore UPDATE_ACK ((zxid & 0xffffffffL) == 0) in processAck(). Looking at the code, the learner from both 3.4 and 3.5 still sending UPTODATE_ACK, so I think we still need to have that similar logic unless it is needed by reconfiguration. We still have that if statement. The one I was talking about previously that has been removed is in Leader#tryToCommit(). Alexander Shraer acking a NEWLEADER message means that state transfer is complete, Observer or not, so why not count this guy's vote. On the other hand, he is not yet a participant, and in the normal (non-recovery) case we wouldn't count its vote (he wouldn't send any). In the case you describe, I thought that the observer (changing to participant) should actually be able to vote because we also need a quorum for the new config, which may also come in a newleader message, no? If this is correct, then we shouldn't have the if statement. I think that (zxid & 0xffffffffL) == 0 means its a NEWLEADER ack, not UPTODATE ack. You're both right. We send a second ack after the uptodate, and it uses the same zxid. It is there for backward compatibility reasons, the ack itself is strictly necessary for the correctness of the protocol. Once we decide what to do with the if statement Alex referred to, I'll make any necessary changes and upload a new patch.
        Hide
        fpj Flavio Junqueira added a comment -

        Cancelling patch to address comments.

        Show
        fpj Flavio Junqueira added a comment - Cancelling patch to address comments.
        Hide
        shralex Alexander Shraer added a comment -

        > In the case you describe, I thought that the observer (changing to participant) should
        > actually be able to vote because we also need a quorum for the new config, which may also come
        > in a newleader message, no? If this is correct, then we shouldn't have the if statement.

        Yes, that's why I also think that for sake of liveness we should let him vote and remove the if as you did. I was just pointing out that this is slightly different than what we do for the normal case, when this is a reconfig op and not a NEWLEADER message. In that case this server
        wouldn't even be able to vote since observers don't vote. Anyway, seems like he should vote if possible, so the if should not be there.

        Show
        shralex Alexander Shraer added a comment - > In the case you describe, I thought that the observer (changing to participant) should > actually be able to vote because we also need a quorum for the new config, which may also come > in a newleader message, no? If this is correct, then we shouldn't have the if statement. Yes, that's why I also think that for sake of liveness we should let him vote and remove the if as you did. I was just pointing out that this is slightly different than what we do for the normal case, when this is a reconfig op and not a NEWLEADER message. In that case this server wouldn't even be able to vote since observers don't vote. Anyway, seems like he should vote if possible, so the if should not be there.
        Hide
        fpj Flavio Junqueira added a comment -

        There is really only the minor change that Thawan has requested, everything else is the same. Is this patch ready to go?

        Show
        fpj Flavio Junqueira added a comment - There is really only the minor change that Thawan has requested, everything else is the same. Is this patch ready to go?
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12581700/ZOOKEEPER-1324.patch
        against trunk revision 1463329.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1468//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1468//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1468//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581700/ZOOKEEPER-1324.patch against trunk revision 1463329. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1468//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1468//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1468//console This message is automatically generated.
        Hide
        phunt Patrick Hunt added a comment -

        This has been committed to branch-3.4, is it ready for trunk?

        Show
        phunt Patrick Hunt added a comment - This has been committed to branch-3.4, is it ready for trunk?
        Hide
        fpj Flavio Junqueira added a comment -

        Alexander Shraer has reviewed it, but has not +1 it. Alex? It would be nice if Thawan Kooburat could +1 it too, since he originated the initial patch and I have modified it to apply to trunk with the ZOOKEEPER-107 changes.

        Show
        fpj Flavio Junqueira added a comment - Alexander Shraer has reviewed it, but has not +1 it. Alex? It would be nice if Thawan Kooburat could +1 it too, since he originated the initial patch and I have modified it to apply to trunk with the ZOOKEEPER-107 changes.
        Hide
        shralex Alexander Shraer added a comment -

        +1

        Show
        shralex Alexander Shraer added a comment - +1
        Hide
        thawan Thawan Kooburat added a comment -

        +1 for me.

        Minor suggestion is to incorporate some portion of Alex's comment(https://issues.apache.org/jira/browse/ZOOKEEPER-1324?focusedCommentId=13647380&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13647380) into the code itself. Since the processReconfig() is now moved into startZkServer() and it might hard to trace back to its origin in the future.

        Show
        thawan Thawan Kooburat added a comment - +1 for me. Minor suggestion is to incorporate some portion of Alex's comment( https://issues.apache.org/jira/browse/ZOOKEEPER-1324?focusedCommentId=13647380&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13647380 ) into the code itself. Since the processReconfig() is now moved into startZkServer() and it might hard to trace back to its origin in the future.
        Hide
        fpj Flavio Junqueira added a comment -

        Added a comment as requested. Is it ok?

        Show
        fpj Flavio Junqueira added a comment - Added a comment as requested. Is it ok?
        Hide
        thawan Thawan Kooburat added a comment -

        Thanks, we should commit this one

        Show
        thawan Thawan Kooburat added a comment - Thanks, we should commit this one
        Hide
        fpj Flavio Junqueira added a comment -

        Trunk commit: Committed revision 1482318.

        Show
        fpj Flavio Junqueira added a comment - Trunk commit: Committed revision 1482318.
        Hide
        fpj Flavio Junqueira added a comment -

        Latest patch didn't apply cleanly to me, so I'm uploading what I have committed.

        Show
        fpj Flavio Junqueira added a comment - Latest patch didn't apply cleanly to me, so I'm uploading what I have committed.
        Hide
        hudson Hudson added a comment -

        Integrated in ZooKeeper-trunk #1926 (See https://builds.apache.org/job/ZooKeeper-trunk/1926/)
        ZOOKEEPER-1324. Remove Duplicate NEWLEADER packets
        from the Leader to the Follower. (Thawan, fpj via fpj) (Revision 1482318)

        Result = SUCCESS
        fpj : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1482318
        Files :

        • /zookeeper/trunk/CHANGES.txt
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
        • /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java
        Show
        hudson Hudson added a comment - Integrated in ZooKeeper-trunk #1926 (See https://builds.apache.org/job/ZooKeeper-trunk/1926/ ) ZOOKEEPER-1324 . Remove Duplicate NEWLEADER packets from the Leader to the Follower. (Thawan, fpj via fpj) (Revision 1482318) Result = SUCCESS fpj : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1482318 Files : /zookeeper/trunk/CHANGES.txt /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java
        Hide
        fpj Flavio Junqueira added a comment -

        Closing issues after releasing 3.4.6.

        Show
        fpj Flavio Junqueira added a comment - Closing issues after releasing 3.4.6.

          People

          • Assignee:
            fpj Flavio Junqueira
            Reporter:
            mahadev Mahadev konar
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development