ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1413

Use on-disk transaction log for learner sync up

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.4.3
    • Fix Version/s: 3.5.0
    • Component/s: server

      Description

      Motivation:
      The learner syncs up with leader by retrieving committed log from the leader. Currently, the leader only keeps 500 entries of recently committed log in memory. If the learner falls behind more than 500 updates, the leader will send the entire snapshot to the learner.

      With the size of the snapshot for some of our Zookeeper deployments (~10G), it is prohibitively expensive to send the entire snapshot over network. Additionally, our Zookeeper may serve more than 4K updates per seconds. As a result, a network hiccups for less than a second will cause the learner to use snapshot transfer.

      Design:
      Instead of looking only at committed log in memory, the leader will also look at transaction log on disk. The amount of transaction log kept on disk is configurable and the current default is 100k. This will allow Zookeeper to tolerate longer temporal network failure before initiating the snapshot transfer.

      Implementation:
      We plan to add interface to the persistence layer will can be use to retrieve proposals from on-disk transaction log. These proposals can then be used to send to the learner using existing protocol.

      1. ZOOKEEPER-1413.patch
        23 kB
        Thawan Kooburat
      2. ZOOKEEPER-1413.patch
        70 kB
        Thawan Kooburat
      3. ZOOKEEPER-1413.patch
        87 kB
        Thawan Kooburat
      4. ZOOKEEPER-1413.patch
        87 kB
        Thawan Kooburat
      5. ZOOKEEPER-1413.patch
        87 kB
        Thawan Kooburat
      6. ZOOKEEPER-1413.patch
        87 kB
        Thawan Kooburat
      7. ZOOKEEPER-1413.patch
        88 kB
        Thawan Kooburat
      8. ZOOKEEPER-1413-3.4.patch
        61 kB
        Germán Blanco
      9. ZOOKEEPER-1413-3.4.patch
        88 kB
        Germán Blanco

        Issue Links

          Activity

          Hide
          Germán Blanco added a comment -

          I changed it from Improvement to Bug this morning. Now it is back to what it was.
          The problem that I see without this patch is that sometimes the new peer and the leader end up in an infinite loop.
          I have seen this when there is a new peer that has a higher epoch than the leader.
          Should that be reported in a different JIRA?

          Show
          Germán Blanco added a comment - I changed it from Improvement to Bug this morning. Now it is back to what it was. The problem that I see without this patch is that sometimes the new peer and the leader end up in an infinite loop. I have seen this when there is a new peer that has a higher epoch than the leader. Should that be reported in a different JIRA?
          Hide
          Flavio Junqueira added a comment -

          This is currently marked as a bug, but I don't think it is a bug, it's an improvement. I'd rather leave this one out of 3.4.6.

          Show
          Flavio Junqueira added a comment - This is currently marked as a bug, but I don't think it is a bug, it's an improvement. I'd rather leave this one out of 3.4.6.
          Hide
          Germán Blanco added a comment -

          Forgot to do "svn add" for two new test classes.

          Show
          Germán Blanco added a comment - Forgot to do "svn add" for two new test classes.
          Hide
          Germán Blanco added a comment -

          I am very sorry, I started updating this issue in order to propose the patch for branch 3.4 (which is already prepared and attached), and now I don't know if I should reopen the issue or not. It is already fixed in trunk and I don't think that opening a new issue makes sense.
          Please help me!

          Show
          Germán Blanco added a comment - I am very sorry, I started updating this issue in order to propose the patch for branch 3.4 (which is already prepared and attached), and now I don't know if I should reopen the issue or not. It is already fixed in trunk and I don't think that opening a new issue makes sense. Please help me!
          Hide
          Germán Blanco added a comment -

          Please consider applying this fix also to branch 3.4.
          It fixes a few problems in the synchronization of servers in the quorum.
          E.g. it solves problems when using TRUNC.
          Without this patch if a peer tries to connect to a leader with a higher epoch, it will end up in an infinite loop.

          Show
          Germán Blanco added a comment - Please consider applying this fix also to branch 3.4. It fixes a few problems in the synchronization of servers in the quorum. E.g. it solves problems when using TRUNC. Without this patch if a peer tries to connect to a leader with a higher epoch, it will end up in an infinite loop.
          Hide
          Thawan Kooburat added a comment -

          Flavio, thanks for help reviewing this patch

          Show
          Thawan Kooburat added a comment - Flavio, thanks for help reviewing this patch
          Hide
          Hudson added a comment -

          Integrated in ZooKeeper-trunk #1975 (See https://builds.apache.org/job/ZooKeeper-trunk/1975/)
          ZOOKEEPER-1413. Use on-disk transaction log for learner sync up (thawan) (Revision 1497929)

          Result = FAILURE
          thawan : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1497929
          Files :

          • /zookeeper/trunk/CHANGES.txt
          • /zookeeper/trunk/ivy.xml
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZKDatabase.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
          • /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java
          • /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java
          • /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java
          • /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java
          Show
          Hudson added a comment - Integrated in ZooKeeper-trunk #1975 (See https://builds.apache.org/job/ZooKeeper-trunk/1975/ ) ZOOKEEPER-1413 . Use on-disk transaction log for learner sync up (thawan) (Revision 1497929) Result = FAILURE thawan : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1497929 Files : /zookeeper/trunk/CHANGES.txt /zookeeper/trunk/ivy.xml /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/TxnLogProposalIterator.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZKDatabase.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/persistence/TxnLog.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/LearnerHandlerTest.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/GetProposalFromTxnTest.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12589992/ZOOKEEPER-1413.patch
          against trunk revision 1495522.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1507//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1507//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1507//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589992/ZOOKEEPER-1413.patch against trunk revision 1495522. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1507//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1507//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1507//console This message is automatically generated.
          Hide
          Thawan Kooburat added a comment -

          Add missing license header

          Show
          Thawan Kooburat added a comment - Add missing license header
          Hide
          Flavio Junqueira added a comment -

          +1, thanks, Thawan. There is one caveat, though, the license header seems to be missing in TxnLogProposalIterator. The one who commits it needs to remember to add the license header, and upload the patch that has been committed.

          Do you want to try to commit this one to check that your accounting setting is ok, Thawan? Please don't forget to edit CHANGES.txt and to add new files.

          Also, it would be good if we decide what to do with ZOOKEEPER-876. My take is that we should resolve it and explain there that there is a corner case that we are not covering in here.

          Show
          Flavio Junqueira added a comment - +1, thanks, Thawan. There is one caveat, though, the license header seems to be missing in TxnLogProposalIterator. The one who commits it needs to remember to add the license header, and upload the patch that has been committed. Do you want to try to commit this one to check that your accounting setting is ok, Thawan? Please don't forget to edit CHANGES.txt and to add new files. Also, it would be good if we decide what to do with ZOOKEEPER-876 . My take is that we should resolve it and explain there that there is a corner case that we are not covering in here.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12589375/ZOOKEEPER-1413.patch
          against trunk revision 1495522.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1505//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1505//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1505//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12589375/ZOOKEEPER-1413.patch against trunk revision 1495522. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1505//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1505//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1505//console This message is automatically generated.
          Hide
          Thawan Kooburat added a comment -

          Fix bug

          Show
          Thawan Kooburat added a comment - Fix bug
          Hide
          Michi Mutsuzaki added a comment -

          Sorry I was wrong. You shouldn't have to cancel and resubmit. From https://builds.apache.org/job/PreCommit-Admin/ :

          The easiest way to rerun testing of a patch is to upload a new patch (with the same filename is fine) to the same Jira. The combination of a Jira being in Patch Available state AND having a new attachment that has never been processed by this system is what will trigger a new test of the patch.

          It looks like the last 2 pre-commit builds timed out. Not sure if it's because of the patch or something is wrong with the buildbot.

          https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/

          Show
          Michi Mutsuzaki added a comment - Sorry I was wrong. You shouldn't have to cancel and resubmit. From https://builds.apache.org/job/PreCommit-Admin/ : The easiest way to rerun testing of a patch is to upload a new patch (with the same filename is fine) to the same Jira. The combination of a Jira being in Patch Available state AND having a new attachment that has never been processed by this system is what will trigger a new test of the patch. It looks like the last 2 pre-commit builds timed out. Not sure if it's because of the patch or something is wrong with the buildbot. https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/
          Hide
          Michi Mutsuzaki added a comment -

          You might need to cancel the patch and submit the patch again to trigger the pre-commit build.

          Show
          Michi Mutsuzaki added a comment - You might need to cancel the patch and submit the patch again to trigger the pre-commit build.
          Hide
          Thawan Kooburat added a comment -

          upload the patch again

          Show
          Thawan Kooburat added a comment - upload the patch again
          Hide
          Thawan Kooburat added a comment -

          Hmm, why didn't Hadoop QA ran. I thought I only need to attach a new patch file to trigger it

          Show
          Thawan Kooburat added a comment - Hmm, why didn't Hadoop QA ran. I thought I only need to attach a new patch file to trigger it
          Hide
          Thawan Kooburat added a comment -
          • Address comments
          • Move to SLF4J 1.7.5
          Show
          Thawan Kooburat added a comment - Address comments Move to SLF4J 1.7.5
          Hide
          Flavio Junqueira added a comment -

          There is something odd about your comment and it might be a problem with my interpretation. When you say that the leader must have an older history compared to a learner, I'm not sure what that means. A prospective leader must be such that its highest zxid is higher than the highest zxid of any member of a quorum. The history of a learner might include some zxids that the leader doesn't have, in which case the leader either tells the learner to truncate or it sends a snapshot.

          About ZOOKEEPER-876, are you suggesting that we resolve it as "won't fix" for now?

          I was also thinking that you're going to generate a new patch to include some of the comments we agreed upon in the review board. Perhaps I have misunderstood it or you simply haven't had time to do it.

          Show
          Flavio Junqueira added a comment - There is something odd about your comment and it might be a problem with my interpretation. When you say that the leader must have an older history compared to a learner, I'm not sure what that means. A prospective leader must be such that its highest zxid is higher than the highest zxid of any member of a quorum. The history of a learner might include some zxids that the leader doesn't have, in which case the leader either tells the learner to truncate or it sends a snapshot. About ZOOKEEPER-876 , are you suggesting that we resolve it as "won't fix" for now? I was also thinking that you're going to generate a new patch to include some of the comments we agreed upon in the review board. Perhaps I have misunderstood it or you simply haven't had time to do it.
          Hide
          Thawan Kooburat added a comment -

          I was thinking about this. I think my design where is the leader is required to have a older history than the learner might be too strict in some cases, given that Zab guaranteed continuous history.

          Anyway, I don't think this case is common enough that we need to handle it. If there is additional change request, I will create another revision so we can review it.

          Show
          Thawan Kooburat added a comment - I was thinking about this. I think my design where is the leader is required to have a older history than the learner might be too strict in some cases, given that Zab guaranteed continuous history. Anyway, I don't think this case is common enough that we need to handle it. If there is additional change request, I will create another revision so we can review it.
          Hide
          Flavio Junqueira added a comment -

          Since it is a performance issue and not a correctness issue, I can go either way. In the case we don't add the logic to handle the case ZOOKEEPER-876 supposedly does, we need to to decide what to do with that issue.

          Show
          Flavio Junqueira added a comment - Since it is a performance issue and not a correctness issue, I can go either way. In the case we don't add the logic to handle the case ZOOKEEPER-876 supposedly does, we need to to decide what to do with that issue.
          Hide
          Thawan Kooburat added a comment -

          Actually, this fails at db.getProposalsFromTxnLog() as shown in the log. The method return a list of proposal starting for zxid X only if the list of proposal start from zxid X or lower.

          Again this is to help simplify queueCommittedProposal() since it needs to enforce the constraint that I mentioned above.

          Show
          Thawan Kooburat added a comment - Actually, this fails at db.getProposalsFromTxnLog() as shown in the log. The method return a list of proposal starting for zxid X only if the list of proposal start from zxid X or lower. Again this is to help simplify queueCommittedProposal() since it needs to enforce the constraint that I mentioned above.
          Hide
          Thawan Kooburat added a comment -

          It doesn't need to get a snapshot but I don't add a logic to handle this special case, since I thought it wasn't a common case.

          Currently, queueCommittedProposal() will try to use DIFF or TRUNC only if it saw at least 1 txn older than what the learner ask for, so we need to modify it a bit to handle this case.

          Show
          Thawan Kooburat added a comment - It doesn't need to get a snapshot but I don't add a logic to handle this special case, since I thought it wasn't a common case. Currently, queueCommittedProposal() will try to use DIFF or TRUNC only if it saw at least 1 txn older than what the learner ask for, so we need to modify it a bit to handle this case.
          Hide
          Flavio Junqueira added a comment -

          Something must be failing in my reasoning. The test (DiffOnRecover) simply simulates a server starting from scratch while the leader has all the necessary txns in the committedLog. If I get what you're saying, your point is that because the follower is in a previous epoch (in this particular case, there is no previous epoch), it must receive a snapshot and I don't understand why you think this is the case.

          Show
          Flavio Junqueira added a comment - Something must be failing in my reasoning. The test (DiffOnRecover) simply simulates a server starting from scratch while the leader has all the necessary txns in the committedLog. If I get what you're saying, your point is that because the follower is in a previous epoch (in this particular case, there is no previous epoch), it must receive a snapshot and I don't understand why you think this is the case.
          Hide
          Thawan Kooburat added a comment -

          You are using DiffOnRecoverTest right?

          2013-06-03 22:42:25,398 [myid:] - WARN  [LearnerHandler-/127.0.0.1:61449:ZKDatabase@307] - Unable to find proposals from txnlog for zxid: 0
          

          The leader will use DIFF to sync with a learner only if it has more history than what the learner request. In this case, it has never seen any txn prior to zxid==0, so it fall back to use a snapshot.

          This case is already covered by LearnerHandlerTest.testNewEpochZxid()

          Show
          Thawan Kooburat added a comment - You are using DiffOnRecoverTest right? 2013-06-03 22:42:25,398 [myid:] - WARN [LearnerHandler-/127.0.0.1:61449:ZKDatabase@307] - Unable to find proposals from txnlog for zxid: 0 The leader will use DIFF to sync with a learner only if it has more history than what the learner request. In this case, it has never seen any txn prior to zxid==0, so it fall back to use a snapshot. This case is already covered by LearnerHandlerTest.testNewEpochZxid()
          Hide
          Flavio Junqueira added a comment -

          I have tried this patch with the test case of ZOOKEEPER-876 and the test fails. Here are a couple of lines in the output log:

          2013-06-03 22:42:25,396 [myid:] - INFO  [LearnerHandler-/127.0.0.1:61449:LearnerHandler@583] - Synchronizing with Follower sid: 3 maxCommittedLog=0x100000005 minCommittedLog=0x100000001 lastProcessedZxid=0x100000005 peerLastZxid=0x0
          2013-06-03 22:42:25,396 [myid:] - INFO  [LearnerHandler-/127.0.0.1:61450:LearnerHandler@583] - Synchronizing with Follower sid: 4 maxCommittedLog=0x100000005 minCommittedLog=0x100000001 lastProcessedZxid=0x100000005 peerLastZxid=0x100000005
          

          There is a server with id 5 that is he leader and 5 ends up sending a snapshot to server 3 although the committedLog contains all txns it needs:

          2013-06-03 22:42:25,401 [myid:] - INFO  [LearnerHandler-/127.0.0.1:61449:LearnerHandler@368] - Sending snapshot last zxid of peer is 0x0 zxid of leader is 0x200000000 sent zxid of db as 0x100000005
          

          and this message seems to be related:

          2013-06-03 22:42:25,398 [myid:] - WARN  [LearnerHandler-/127.0.0.1:61449:ZKDatabase@307] - Unable to find proposals from txnlog for zxid: 0
          
          Show
          Flavio Junqueira added a comment - I have tried this patch with the test case of ZOOKEEPER-876 and the test fails. Here are a couple of lines in the output log: 2013-06-03 22:42:25,396 [myid:] - INFO [LearnerHandler-/127.0.0.1:61449:LearnerHandler@583] - Synchronizing with Follower sid: 3 maxCommittedLog=0x100000005 minCommittedLog=0x100000001 lastProcessedZxid=0x100000005 peerLastZxid=0x0 2013-06-03 22:42:25,396 [myid:] - INFO [LearnerHandler-/127.0.0.1:61450:LearnerHandler@583] - Synchronizing with Follower sid: 4 maxCommittedLog=0x100000005 minCommittedLog=0x100000001 lastProcessedZxid=0x100000005 peerLastZxid=0x100000005 There is a server with id 5 that is he leader and 5 ends up sending a snapshot to server 3 although the committedLog contains all txns it needs: 2013-06-03 22:42:25,401 [myid:] - INFO [LearnerHandler-/127.0.0.1:61449:LearnerHandler@368] - Sending snapshot last zxid of peer is 0x0 zxid of leader is 0x200000000 sent zxid of db as 0x100000005 and this message seems to be related: 2013-06-03 22:42:25,398 [myid:] - WARN [LearnerHandler-/127.0.0.1:61449:ZKDatabase@307] - Unable to find proposals from txnlog for zxid: 0
          Hide
          Thawan Kooburat added a comment -

          About snapshotSizeFactor, this parameter need to be tuned in conjunction with ZOOKEEPER-1709.

          Let says your snapshot size is 600MB, but the most recent txnlog file is 2GB. We don't want to load txnlog at all since it is slower than sending a snapshot even if the learner may only need to a few txn at the end of the txnlog file.

          With ZOOKEEPER-1709, you can limit the size of the most recent txnlog to 200MB. If snapshot factor is configured at 1/3. The leader will use txnlog if it only needs to load 1 txnlog file.

          I haven't done extensive testing to figure out the optimal setting. In our prod, the txnlog size limit should be small enough that the leader don't overload if a large number of learners try to synchronize at once. And the snapshot factor is configured such that the leader try to use at least 1 txnlog.

          Show
          Thawan Kooburat added a comment - About snapshotSizeFactor, this parameter need to be tuned in conjunction with ZOOKEEPER-1709 . Let says your snapshot size is 600MB, but the most recent txnlog file is 2GB. We don't want to load txnlog at all since it is slower than sending a snapshot even if the learner may only need to a few txn at the end of the txnlog file. With ZOOKEEPER-1709 , you can limit the size of the most recent txnlog to 200MB. If snapshot factor is configured at 1/3. The leader will use txnlog if it only needs to load 1 txnlog file. I haven't done extensive testing to figure out the optimal setting. In our prod, the txnlog size limit should be small enough that the leader don't overload if a large number of learners try to synchronize at once. And the snapshot factor is configured such that the leader try to use at least 1 txnlog.
          Hide
          Thawan Kooburat added a comment -

          I haven't looked at ZOOKEEPER-876 in details but I believe this patch address all of the problem mentioned in the description. If there is any case not covered by LearnerHandlerTest, then I can add it

          Show
          Thawan Kooburat added a comment - I haven't looked at ZOOKEEPER-876 in details but I believe this patch address all of the problem mentioned in the description. If there is any case not covered by LearnerHandlerTest, then I can add it
          Hide
          Flavio Junqueira added a comment -

          I have gone through the patch on the review board, but I still need to contrast the patch in ZOOKEEPER-876 against this one. If you already have thoughts about it, let me know.

          Show
          Flavio Junqueira added a comment - I have gone through the patch on the review board, but I still need to contrast the patch in ZOOKEEPER-876 against this one. If you already have thoughts about it, let me know.
          Hide
          Flavio Junqueira added a comment -

          I had forgotten about that issue, good call. I was wondering if we should consider incorporating some of the work, for example, test cases.

          Note that I've marking it so that this jira contains ZOOKEEPER-876, so it supersedes that jira. To confirm, is it the case?

          Show
          Flavio Junqueira added a comment - I had forgotten about that issue, good call. I was wondering if we should consider incorporating some of the work, for example, test cases. Note that I've marking it so that this jira contains ZOOKEEPER-876 , so it supersedes that jira. To confirm, is it the case?
          Hide
          Thawan Kooburat added a comment -

          The main benefit of this patch is to reduce the chance that the leader need to send a snapshot to the learner. So it tries to fix various corner cases in synchronization logic as reported in JIRAs like ZOOKEEPER-1465, ZOOKEEPER-876 (I just found this one)

          The most common corner case that cause the leader to send a snapshot to learners is when the learner fall behind by more than 1 epoch. This is possible if the leader fail to form a quorum and move to a new epoch in a quick succession (let say you have situation like ZOOKEEPER-1697).

          For us, it is prohibitively expensive to send a snapshot because we have a large number of observers and sending a snapshot will kick-out client sessions on RO-observers (ZOOKEEPER-1607). I don't have a concrete number at the moment, but this is one of the critical feature that kept our infrastructure running.

          Show
          Thawan Kooburat added a comment - The main benefit of this patch is to reduce the chance that the leader need to send a snapshot to the learner. So it tries to fix various corner cases in synchronization logic as reported in JIRAs like ZOOKEEPER-1465 , ZOOKEEPER-876 (I just found this one) The most common corner case that cause the leader to send a snapshot to learners is when the learner fall behind by more than 1 epoch. This is possible if the leader fail to form a quorum and move to a new epoch in a quick succession (let say you have situation like ZOOKEEPER-1697 ). For us, it is prohibitively expensive to send a snapshot because we have a large number of observers and sending a snapshot will kick-out client sessions on RO-observers ( ZOOKEEPER-1607 ). I don't have a concrete number at the moment, but this is one of the critical feature that kept our infrastructure running.
          Hide
          Flavio Junqueira added a comment -

          Also, do you have by any chance numbers showing the difference it makes to the recovery time? If you happen to have, I'd be interested in seeing them.

          Show
          Flavio Junqueira added a comment - Also, do you have by any chance numbers showing the difference it makes to the recovery time? If you happen to have, I'd be interested in seeing them.
          Hide
          Flavio Junqueira added a comment -

          Hi Thawan, Thanks for the patch, I'll review it. I really like this feature.

          Show
          Flavio Junqueira added a comment - Hi Thawan, Thanks for the patch, I'll review it. I really like this feature.
          Hide
          Thawan Kooburat added a comment -

          Let me know if you need me to explain anything about this patch. The high level design documented in this JIRA is still valid. There is a lot of corner cases in synchronization logic, but there is a lot of comments in the code.

          During the development of this patch, I continuously ran FollowerResyncConcurrencyTest for a few days. So the enhancement in that test is to get rid of false positive and aid debugging.

          The current code match what we have been using in our prod for more than half a year.

          Show
          Thawan Kooburat added a comment - Let me know if you need me to explain anything about this patch. The high level design documented in this JIRA is still valid. There is a lot of corner cases in synchronization logic, but there is a lot of comments in the code. During the development of this patch, I continuously ran FollowerResyncConcurrencyTest for a few days. So the enhancement in that test is to get rid of false positive and aid debugging. The current code match what we have been using in our prod for more than half a year.
          Show
          Thawan Kooburat added a comment - https://reviews.apache.org/r/11231/
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12583719/ZOOKEEPER-1413.patch
          against trunk revision 1483440.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 12 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1484//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1484//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1484//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12583719/ZOOKEEPER-1413.patch against trunk revision 1483440. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1484//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1484//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1484//console This message is automatically generated.
          Hide
          Thawan Kooburat added a comment -

          Add missing unit test

          Show
          Thawan Kooburat added a comment - Add missing unit test
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12583715/ZOOKEEPER-1413.patch
          against trunk revision 1483440.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1483//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1483//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1483//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12583715/ZOOKEEPER-1413.patch against trunk revision 1483440. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1483//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1483//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1483//console This message is automatically generated.
          Hide
          Thawan Kooburat added a comment -

          Will post to review board when all tests passed

          Show
          Thawan Kooburat added a comment - Will post to review board when all tests passed
          Hide
          Flavio Junqueira added a comment -

          This a nice feature, the patch here is currently stale and doesn't apply cleanly to trunk. One simple comment about the patch: please don't leave TODOs in the code, create jira issues instead.

          Show
          Flavio Junqueira added a comment - This a nice feature, the patch here is currently stale and doesn't apply cleanly to trunk. One simple comment about the patch: please don't leave TODOs in the code, create jira issues instead.
          Hide
          Thawan Kooburat added a comment -

          Found some bug with the current implementation, so I will submit a new one after fixing it.

          Show
          Thawan Kooburat added a comment - Found some bug with the current implementation, so I will submit a new one after fixing it.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12519268/ZOOKEEPER-1413.patch
          against trunk revision 1302736.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1004//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519268/ZOOKEEPER-1413.patch against trunk revision 1302736. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1004//console This message is automatically generated.
          Hide
          Thawan Kooburat added a comment -

          The attached patch implements what described in the issue.

          However, we also add simple threshold to determine if it is better to send txnlog or entire snapshot. Currently, we only use txlong if on-disk size of the log is less than one third of the last snapshot.

          Show
          Thawan Kooburat added a comment - The attached patch implements what described in the issue. However, we also add simple threshold to determine if it is better to send txnlog or entire snapshot. Currently, we only use txlong if on-disk size of the log is less than one third of the last snapshot.

            People

            • Assignee:
              Thawan Kooburat
              Reporter:
              Thawan Kooburat
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development