ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1319

Missing data after restarting+expanding a cluster

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.4.0
    • Fix Version/s: 3.4.1, 3.5.0
    • Component/s: None
    • Labels:
    • Environment:

      Linux (Debian Squeeze)

    • Hadoop Flags:
      Reviewed

      Description

      I've been trying to update to ZK 3.4.0 and have had some issues where some data become inaccessible after adding a node to a cluster. My use case is a bit strange (as explained before on this list) in that I try to grow the cluster dynamically by having an external program automatically restart Zookeeper servers in a controlled way whenever the list of participating ZK servers needs to change. This used to work just fine in 3.3.3 (and before), so this represents a regression.

      The scenario I see is this:

      1) Start up a 1-server ZK cluster (the server has ZK ID 0).
      2) A client connects to the server, and makes a bunch of znodes, in particular a znode called "/membership".
      3) Shut down the cluster.
      4) Bring up a 2-server ZK cluster, including the original server 0 with its existing data, and a new server with ZK ID 1.
      5) Node 0 has the highest zxid and is elected leader.
      6) A client connecting to server 1 tries to "get /membership" and gets back a -101 error code (no such znode).
      7) The same client then tries to "create /membership" and gets back a -110 error code (znode already exists).
      8) Clients connecting to server 0 can successfully "get /membership".

      I will attach a tarball with debug logs for both servers, annotating where steps #1 and #4 happen. You can see that the election involves a proposal for zxid 110 from server 0, but immediately following the election server 1 has these lines:

      2011-12-05 17:18:48,308 9299 [QuorumPeer[myid=1]/127.0.0.1:2901] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x100000001 expected 0x1
      2011-12-05 17:18:48,313 9304 [SyncThread:1] INFO org.apache.zookeeper.server.persistence.FileTxnLog - Creating new log file: log.100000001

      Perhaps that's not relevant, but it struck me as odd. At the end of server 1's log you can see a repeated cycle of getData->create->getData as the client tries to make sense of the inconsistent responses.

      The other piece of information is that if I try to use the on-disk directories for either of the servers to start a new one-node ZK cluster, all the data are accessible.

      I haven't tried writing a program outside of my application to reproduce this, but I can do it very easily with some of my app's tests if anyone needs more information.

      1. logs.tgz
        90 kB
        Jeremy Stribling
      2. ZOOKEEPER-1319_trunk.patch
        8 kB
        Patrick Hunt
      3. ZOOKEEPER-1319_trunk2.patch
        8 kB
        Benjamin Reed
      4. ZOOKEEPER-1319.patch
        9 kB
        Patrick Hunt
      5. ZOOKEEPER-1319.patch
        9 kB
        Patrick Hunt

        Activity

        Hide
        Jeremy Stribling added a comment -

        Contains two logs, one for server 0 and one for server 1. The logs are annotated to describe where the cluster was restarted.

        Show
        Jeremy Stribling added a comment - Contains two logs, one for server 0 and one for server 1. The logs are annotated to describe where the cluster was restarted.
        Hide
        Patrick Hunt added a comment -

        I see the problem, a change made in ZOOKEEPER-1136 is causing this regression. setting of lastProposed in the lead() method of Leader was commented out for some reason. As a result the new follower is getting an empty diff rather than a snapshot.

        This is a serious issue as it's causing the follower to get an inconsistent view of the database. We'll need to roll 3.4.1 asap.

        Jeremy – Thanks for reporting this issue!

        Show
        Patrick Hunt added a comment - I see the problem, a change made in ZOOKEEPER-1136 is causing this regression. setting of lastProposed in the lead() method of Leader was commented out for some reason. As a result the new follower is getting an empty diff rather than a snapshot. This is a serious issue as it's causing the follower to get an inconsistent view of the database. We'll need to roll 3.4.1 asap. Jeremy – Thanks for reporting this issue!
        Hide
        Flavio Junqueira added a comment -

        I didn't see lastProposed commented out in the patch of ZOOKEEPER-1136. Have I misunderstood your comment, Pat?

        Show
        Flavio Junqueira added a comment - I didn't see lastProposed commented out in the patch of ZOOKEEPER-1136 . Have I misunderstood your comment, Pat?
        Hide
        Patrick Hunt added a comment -

        @flavio look at line 268 of the patch for 1136.

        Show
        Patrick Hunt added a comment - @flavio look at line 268 of the patch for 1136.
        Hide
        Patrick Hunt added a comment -

        This patch fixes the problem.

        It seems that Zab1_0 tests were not checking for the correct sequence of events on the learner side - that's really frustrating (and why 1136 commented out that section of setting lastProposed - because with it zab1_0 tests failed). Otw (if zab1_0 was right) we would have found this before 3.4.0 shipped.

        Please double check my changes to zab1_0 tests - in particular should we be seeing this sequence? Two NEWLEADERS? (iirc that's expected)

        Show
        Patrick Hunt added a comment - This patch fixes the problem. It seems that Zab1_0 tests were not checking for the correct sequence of events on the learner side - that's really frustrating (and why 1136 commented out that section of setting lastProposed - because with it zab1_0 tests failed). Otw (if zab1_0 was right) we would have found this before 3.4.0 shipped. Please double check my changes to zab1_0 tests - in particular should we be seeing this sequence? Two NEWLEADERS? (iirc that's expected)
        Hide
        Patrick Hunt added a comment -

        btw, I added a new test which fails prior to fixing the issue in Leader.java, it succeeds subsequent to fixing the issue.
        org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testLaggingFollowerResyncsUnderNewEpoch()

        Show
        Patrick Hunt added a comment - btw, I added a new test which fails prior to fixing the issue in Leader.java, it succeeds subsequent to fixing the issue. org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testLaggingFollowerResyncsUnderNewEpoch()
        Hide
        Patrick Hunt added a comment -

        minor change to patch - fixed indent issue.

        Show
        Patrick Hunt added a comment - minor change to patch - fixed indent issue.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12506515/ZOOKEEPER-1319.patch
        against trunk revision 1208979.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/807//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506515/ZOOKEEPER-1319.patch against trunk revision 1208979. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/807//console This message is automatically generated.
        Hide
        Patrick Hunt added a comment -

        Same patch applied to trunk.

        Show
        Patrick Hunt added a comment - Same patch applied to trunk.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12506524/ZOOKEEPER-1319_trunk.patch
        against trunk revision 1208979.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/808//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/808//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/808//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506524/ZOOKEEPER-1319_trunk.patch against trunk revision 1208979. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/808//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/808//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/808//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12506524/ZOOKEEPER-1319_trunk.patch
        against trunk revision 1208979.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/809//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/809//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/809//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506524/ZOOKEEPER-1319_trunk.patch against trunk revision 1208979. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/809//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/809//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/809//console This message is automatically generated.
        Hide
        Flavio Junqueira added a comment -

        +1, looks good, Pat. About the double occurrence of NEWLEADER, it happens because we insert NEWLEADER in outstandingRequests in Leader.lead() and queue a NEWLEADER message in LearnerHandler.run(). When we execute LearnerHandler.startForwarding() from LearnerHandler.run(), we queue the packets in outstandingRequests, including NEWLEADER.

        It is not necessary to send it again in startForwarding(), but we do need it in outstandingRequests to collect acks. Since we have to add it to outstandingRequests, one simple way to avoid it is by performing a check like this in startForwarding:

                        if(outstandingProposals.get(zxid).packet.getType() != NEWLEADER){
                            handler.queuePacket(outstandingProposals.get(zxid).packet);
                        }
        
        

        I have verified that by including this check, I can remove the double occurrence of NEWLEADER in Pat's patch and the test passes. We may want to consider this check in some later release.

        Show
        Flavio Junqueira added a comment - +1, looks good, Pat. About the double occurrence of NEWLEADER, it happens because we insert NEWLEADER in outstandingRequests in Leader.lead() and queue a NEWLEADER message in LearnerHandler.run(). When we execute LearnerHandler.startForwarding() from LearnerHandler.run(), we queue the packets in outstandingRequests, including NEWLEADER. It is not necessary to send it again in startForwarding(), but we do need it in outstandingRequests to collect acks. Since we have to add it to outstandingRequests, one simple way to avoid it is by performing a check like this in startForwarding: if(outstandingProposals.get(zxid).packet.getType() != NEWLEADER){ handler.queuePacket(outstandingProposals.get(zxid).packet); } I have verified that by including this check, I can remove the double occurrence of NEWLEADER in Pat's patch and the test passes. We may want to consider this check in some later release.
        Hide
        Mahadev konar added a comment -

        Flavio/Pat,
        Ben and I had a long discussion on this.
        Here is the gist - There are 2 NEWLEADER packets one added when the Leader just has become a leader and one added in startforwarding as Flavio mentioned above. We need to skip adding the first one (the one in leader.lead()) to the queue of packets to send to the follower. Flavio is right above that if we skip the adding of NEWLEADER in start forwarding we are good. We need to send the NEWLEADER packet in LearnerHandler (Line 390) because that means the end of all syncing up transactions from the Leader for the follower.

        Ben has an updated patch and will update the jira soon tonight.

        Show
        Mahadev konar added a comment - Flavio/Pat, Ben and I had a long discussion on this. Here is the gist - There are 2 NEWLEADER packets one added when the Leader just has become a leader and one added in startforwarding as Flavio mentioned above. We need to skip adding the first one (the one in leader.lead()) to the queue of packets to send to the follower. Flavio is right above that if we skip the adding of NEWLEADER in start forwarding we are good. We need to send the NEWLEADER packet in LearnerHandler (Line 390) because that means the end of all syncing up transactions from the Leader for the follower. Ben has an updated patch and will update the jira soon tonight.
        Hide
        Benjamin Reed added a comment -

        this should fix everything. i'd like to add a couple more unit tests, but the functional fixes are in.

        Show
        Benjamin Reed added a comment - this should fix everything. i'd like to add a couple more unit tests, but the functional fixes are in.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12506569/ZOOKEEPER-1319_trunk2.patch
        against trunk revision 1208979.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/812//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/812//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/812//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506569/ZOOKEEPER-1319_trunk2.patch against trunk revision 1208979. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/812//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/812//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/812//console This message is automatically generated.
        Hide
        Benjamin Reed added a comment -

        never mind, i figured out that the case i wanted to test for was impossible: an outstandingProposal queue with a NL and other proposals, but i realized that other proposals will not be generated until the NL message is committed at which point it will no longer be in the outstandingProposal queue, so the double NL is completely innocuous...

        Show
        Benjamin Reed added a comment - never mind, i figured out that the case i wanted to test for was impossible: an outstandingProposal queue with a NL and other proposals, but i realized that other proposals will not be generated until the NL message is committed at which point it will no longer be in the outstandingProposal queue, so the double NL is completely innocuous...
        Hide
        Benjamin Reed added a comment -

        never mind, i figured out that the case i wanted to test for was impossible: an outstandingProposal queue with a NL and other proposals, but i realized that other proposals will not be generated until the NL message is committed at which point it will no longer be in the outstandingProposal queue, so the double NL is completely innocuous...

        Show
        Benjamin Reed added a comment - never mind, i figured out that the case i wanted to test for was impossible: an outstandingProposal queue with a NL and other proposals, but i realized that other proposals will not be generated until the NL message is committed at which point it will no longer be in the outstandingProposal queue, so the double NL is completely innocuous...
        Hide
        Patrick Hunt added a comment -

        sounds fine, two questions come to mind though.

        1) why make this change now, this has been the behavior for a while, no? seems safer to just leave it as is for 3.4 and change in 3.5+

        2) what are the implications, if any, for backward compatibility with 3.3 based servers?

        Show
        Patrick Hunt added a comment - sounds fine, two questions come to mind though. 1) why make this change now, this has been the behavior for a while, no? seems safer to just leave it as is for 3.4 and change in 3.5+ 2) what are the implications, if any, for backward compatibility with 3.3 based servers?
        Hide
        Flavio Junqueira added a comment -

        Since the double NEWLEADER is harmless, it doesn't sound like a big deal to me to postpone to a future release. It is mainly inefficient to have it that way. At the same time, the fix seems to be simple enough so we could consider.

        I don't think it causes backward compatibility problems. In my understanding, all it happens in the Leaner is executing this twice:

                        case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
                            zk.takeSnapshot();
                            self.setCurrentEpoch(newEpoch);
                            snapshotTaken = true;
                            writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), true);
                            break;
                        }
        
        

        I believe this is idempotent.

        Show
        Flavio Junqueira added a comment - Since the double NEWLEADER is harmless, it doesn't sound like a big deal to me to postpone to a future release. It is mainly inefficient to have it that way. At the same time, the fix seems to be simple enough so we could consider. I don't think it causes backward compatibility problems. In my understanding, all it happens in the Leaner is executing this twice: case Leader.NEWLEADER: // it will be NEWLEADER in v1.0 zk.takeSnapshot(); self.setCurrentEpoch(newEpoch); snapshotTaken = true; writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), true); break; } I believe this is idempotent.
        Hide
        Jeremy Stribling added a comment -

        Patch 2 fixes the problem my tests were seeing. Thanks!

        Show
        Jeremy Stribling added a comment - Patch 2 fixes the problem my tests were seeing. Thanks!
        Hide
        Patrick Hunt added a comment -

        That's great, thanks for following up!

        Show
        Patrick Hunt added a comment - That's great, thanks for following up!
        Hide
        Mahadev konar added a comment -

        I am more inclined towards what flavio mentioned above. To reduce the number of changes I think its best we dont remove the duplicate NEWLEADER. Ben any thoughts?

        Show
        Mahadev konar added a comment - I am more inclined towards what flavio mentioned above. To reduce the number of changes I think its best we dont remove the duplicate NEWLEADER. Ben any thoughts?
        Hide
        Patrick Hunt added a comment -

        mahadev - jeremy mentioned that he tested with the second version of the patch, is that going to be a problem? (re confirmation of fix).

        Show
        Patrick Hunt added a comment - mahadev - jeremy mentioned that he tested with the second version of the patch, is that going to be a problem? (re confirmation of fix).
        Hide
        Jeremy Stribling added a comment -

        Patch 1 also seems to do the trick.

        Show
        Jeremy Stribling added a comment - Patch 1 also seems to do the trick.
        Hide
        Mahadev konar added a comment -

        I am going ahead and checking in Pat's patch. I have opened ZOOKEEPER-1324 to track the duplicate NEWLEADER packets. Just being paranoid here and making minimal changes for the RC.

        Show
        Mahadev konar added a comment - I am going ahead and checking in Pat's patch. I have opened ZOOKEEPER-1324 to track the duplicate NEWLEADER packets. Just being paranoid here and making minimal changes for the RC.
        Hide
        Mahadev konar added a comment -

        I just committed Pat's patch to 3.4 and trunk. Thanks all for contributing on this!

        Show
        Mahadev konar added a comment - I just committed Pat's patch to 3.4 and trunk. Thanks all for contributing on this!
        Hide
        Hudson added a comment -

        Integrated in ZooKeeper-trunk #1390 (See https://builds.apache.org/job/ZooKeeper-trunk/1390/)
        ZOOKEEPER-1319. Missing data after restarting+expanding a cluster. (phunt and breed via mahadev)

        mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212578
        Files :

        • /zookeeper/trunk/CHANGES.txt
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java
        • /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java
        • /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java
        Show
        Hudson added a comment - Integrated in ZooKeeper-trunk #1390 (See https://builds.apache.org/job/ZooKeeper-trunk/1390/ ) ZOOKEEPER-1319 . Missing data after restarting+expanding a cluster. (phunt and breed via mahadev) mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212578 Files : /zookeeper/trunk/CHANGES.txt /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/Leader.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/FollowerResyncConcurrencyTest.java

          People

          • Assignee:
            Patrick Hunt
            Reporter:
            Jeremy Stribling
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development