ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-876

Unnecessary snapshot transfers between new leader and followers

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.4.0
    • Fix Version/s: 3.5.0
    • Component/s: None
    • Labels:
      None

      Description

      When starting a new leadership, unnecessary snapshot transfers happen between new leader and followers. This is so because of multiple small bugs.

      1) the comparison of zxids is done based on a new proposal, instead of the last logged zxid. (LearnerHandler.java ~ 297)
      2) if follower is one zxid behind, the check of the interval of committed logs excludes the follower. (LearnerHandler.java ~ 277)
      3) the bug reported in ZOOKEEPER-874 (commitLogs are empty after recover).

      1. second_case.patch
        5 kB
        Diogo
      2. second_case.txt
        31 kB
        Diogo
      3. TEST-org.apache.zookeeper.test.FollowerResyncConcurrencyTest.txt
        652 kB
        Flavio Junqueira
      4. ZOOKEEPER-876.patch
        9 kB
        Diogo

        Issue Links

          Activity

          Hide
          Diogo added a comment -

          If follower is one zxid behind, the check of the interval of committed logs excludes the follower in LearnerFollower.java:269. For example, if follower has snapshot-1000 and no logs, and leader has snapshot-1000, log-1001, and so on. The check in that line would force the follower receive again a snapshot. I think the condition should consider minCommittedLog - 1 as lower bound in the test.

          Show
          Diogo added a comment - If follower is one zxid behind, the check of the interval of committed logs excludes the follower in LearnerFollower.java:269. For example, if follower has snapshot-1000 and no logs, and leader has snapshot-1000, log-1001, and so on. The check in that line would force the follower receive again a snapshot. I think the condition should consider minCommittedLog - 1 as lower bound in the test.
          Hide
          Diogo added a comment -

          Depends on the patch ZOOKEEPER-874. (Applied in any order)

          Show
          Diogo added a comment - Depends on the patch ZOOKEEPER-874 . (Applied in any order)
          Hide
          Diogo added a comment -

          Some fixes and a unit test.

          Show
          Diogo added a comment - Some fixes and a unit test.
          Hide
          Flavio Junqueira added a comment -

          This is a nice catch, Diogo, and the patch looks good to me. I have a few very quick comments:

          1. Instead of returning a pair of longs in startForwarding, we could simply return maxZxid and read lastProposed directly from the leader object. Doesn't it work?
          2. The first comment of startForwarding is not saying much. Could you please expand it?
          3. Could you please explain in the beginning of the test case what it is supposed to be testing? It is for later remembering what the test does.

          Good job!

          Show
          Flavio Junqueira added a comment - This is a nice catch, Diogo, and the patch looks good to me. I have a few very quick comments: Instead of returning a pair of longs in startForwarding, we could simply return maxZxid and read lastProposed directly from the leader object. Doesn't it work? The first comment of startForwarding is not saying much. Could you please expand it? Could you please explain in the beginning of the test case what it is supposed to be testing? It is for later remembering what the test does. Good job!
          Hide
          Flavio Junqueira added a comment -

          Hi Diogo, What's the status of this patch? Should we submit it?

          Show
          Flavio Junqueira added a comment - Hi Diogo, What's the status of this patch? Should we submit it?
          Hide
          Diogo added a comment -

          Fix of item 2 in description + test case

          Show
          Diogo added a comment - Fix of item 2 in description + test case
          Hide
          Diogo added a comment -

          I provided a new patch for this issue. It works in current trunk. The patch fixes only issue 2 in the list above because the effects of issue 1 are minimal.

          Show
          Diogo added a comment - I provided a new patch for this issue. It works in current trunk. The patch fixes only issue 2 in the list above because the effects of issue 1 are minimal.
          Hide
          Flavio Junqueira added a comment -

          Diogo, I believe you meant to submit this patch for review, correct?

          Show
          Flavio Junqueira added a comment - Diogo, I believe you meant to submit this patch for review, correct?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12476319/ZOOKEEPER-876.patch
          against trunk revision 1091841.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/230//testReport/
          Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/230//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/230//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12476319/ZOOKEEPER-876.patch against trunk revision 1091841. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/230//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/230//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/230//console This message is automatically generated.
          Hide
          Flavio Junqueira added a comment -

          Diogo, Could you please upload the patch again and resubmit?

          Show
          Flavio Junqueira added a comment - Diogo, Could you please upload the patch again and resubmit?
          Hide
          Flavio Junqueira added a comment -

          One test is failing for me with this patch: FollowerResyncConcurrencyTest.

          I'm also not convinced that the predicate of the if block in LearnerHandler will work in all cases. The last proposal before minProposal might not be minProposal - 1. Using "- 1" seems to capture the case in which the two zxids are from the same epoch, but does not cover cases in which they are from different epochs. Actually, I wonder if ignoring the epoch value can lead to incorrect behavior, since it is possible that the predicate evaluates to true while the peer last zxid is not the last before minCommittedLog. Is this observation right?

          Any comment, Diogo?

          Show
          Flavio Junqueira added a comment - One test is failing for me with this patch: FollowerResyncConcurrencyTest. I'm also not convinced that the predicate of the if block in LearnerHandler will work in all cases. The last proposal before minProposal might not be minProposal - 1. Using "- 1" seems to capture the case in which the two zxids are from the same epoch, but does not cover cases in which they are from different epochs. Actually, I wonder if ignoring the epoch value can lead to incorrect behavior, since it is possible that the predicate evaluates to true while the peer last zxid is not the last before minCommittedLog. Is this observation right? Any comment, Diogo?
          Hide
          Flavio Junqueira added a comment -

          Log file of failed test.

          Show
          Flavio Junqueira added a comment - Log file of failed test.
          Hide
          Camille Fournier added a comment -

          FollowerResyncConcurrencyTest won't fail with just the changes to LearnerHandler (which seem to directly address the bug), but the changes in Learner cause it to fail. I notice that you commented out
          // zk.getZKDatabase().processTxn(pif.hdr, pif.rec);

          Did you intend to do this? Adding the transaction to the committed log alone won't suffice for actually applying it to the data tree.

          Show
          Camille Fournier added a comment - FollowerResyncConcurrencyTest won't fail with just the changes to LearnerHandler (which seem to directly address the bug), but the changes in Learner cause it to fail. I notice that you commented out // zk.getZKDatabase().processTxn(pif.hdr, pif.rec); Did you intend to do this? Adding the transaction to the committed log alone won't suffice for actually applying it to the data tree.
          Hide
          Diogo added a comment -

          You are both right.

          @Camille: true, this line should not be commented. The idea was to do exactly what was done before, and to add the proposal of the leader to the committedLog.

          @Flavio: yes, it seems the patch is not taking all cases into account.

          I am uploading a log and a patch with a test case that shows two problems:

          (1) 3 processes out of 5 are started (look for string RESTART QUORUM): p3, p4 and p5. Process p3 is just some zxids behind the other processes, so it should receive a DIFF from leader p5, but instead if receives a SNAP.

          (2) After they are synchronized I stop them, and start them again (look for RESTART QUORUM AGAIN). There was no request processed by the quorum, so p3 and p4 should receive an empty DIFF from leader p5, but both receive a SNAP.

          These problems seem to be caused by a couple of issues in the way the epochs are compared and because the committedLog (which should contain the committed proposals) is sometimes incomplete. I will update the patch later on based on the current trunk and check the issues Flavio mentioned.

          Show
          Diogo added a comment - You are both right. @Camille: true, this line should not be commented. The idea was to do exactly what was done before, and to add the proposal of the leader to the committedLog. @Flavio: yes, it seems the patch is not taking all cases into account. I am uploading a log and a patch with a test case that shows two problems: (1) 3 processes out of 5 are started (look for string RESTART QUORUM): p3, p4 and p5. Process p3 is just some zxids behind the other processes, so it should receive a DIFF from leader p5, but instead if receives a SNAP. (2) After they are synchronized I stop them, and start them again (look for RESTART QUORUM AGAIN). There was no request processed by the quorum, so p3 and p4 should receive an empty DIFF from leader p5, but both receive a SNAP. These problems seem to be caused by a couple of issues in the way the epochs are compared and because the committedLog (which should contain the committed proposals) is sometimes incomplete. I will update the patch later on based on the current trunk and check the issues Flavio mentioned.
          Hide
          Diogo added a comment -

          DiffOnRecover shows the two problems mentioned in the comment

          Show
          Diogo added a comment - DiffOnRecover shows the two problems mentioned in the comment
          Hide
          Diogo added a comment -

          Log of DiffOnRecover

          Show
          Diogo added a comment - Log of DiffOnRecover
          Hide
          Diogo added a comment -

          Let me first focus on the second case above. The scenario is very simple:

          1 - The servers are in sync, all of them are stopped.
          2 - When they are restarted, they elect a new leader and a DIFF is sent.
          3 - If they are stopped and restarted again, then a snapshot will be sent.

          There is however no difference in their state, so there is absolutely no need to transfer a snapshot. The problem seems to be that after the restart the followers have the same zxid of the last accepted update (at step 1), ie, they somehow forgot the NEWLEADER of step 2.

          I am uploading a log that shows that. I used grep to show only the output of QuorumPeer. That is enough to see the problem. The patch I am sending can produce such log (ant test-core-java -Dtestcase="DiffOn*").

          Show
          Diogo added a comment - Let me first focus on the second case above. The scenario is very simple: 1 - The servers are in sync, all of them are stopped. 2 - When they are restarted, they elect a new leader and a DIFF is sent. 3 - If they are stopped and restarted again, then a snapshot will be sent. There is however no difference in their state, so there is absolutely no need to transfer a snapshot. The problem seems to be that after the restart the followers have the same zxid of the last accepted update (at step 1), ie, they somehow forgot the NEWLEADER of step 2. I am uploading a log that shows that. I used grep to show only the output of QuorumPeer. That is enough to see the problem. The patch I am sending can produce such log (ant test-core-java -Dtestcase="DiffOn*").
          Hide
          Mahadev konar added a comment -

          This is not a blocker. MOving this out to 3.5.0

          Show
          Mahadev konar added a comment - This is not a blocker. MOving this out to 3.5.0
          Hide
          Thawan Kooburat added a comment -

          ZOOKEEPER-1413 address all of the issues mentioned in the description. However, there is one corner case (used by the unit test in this patch) which is not covered by ZOOKEEPER-1413 where the learner tries to sync with leader with no txnlog.

          Show
          Thawan Kooburat added a comment - ZOOKEEPER-1413 address all of the issues mentioned in the description. However, there is one corner case (used by the unit test in this patch) which is not covered by ZOOKEEPER-1413 where the learner tries to sync with leader with no txnlog.

            People

            • Assignee:
              Diogo
              Reporter:
              Diogo
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development