ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-582

ZooKeeper can revert to old data when a snapshot is created outside of normal processing

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.1.1, 3.2.1
    • Fix Version/s: 3.1.2, 3.2.2, 3.3.0
    • Component/s: server
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      fixed bug in zookeeper that can lead zookeeper to revert to old data when a snapshot is created without a corresponding log.

      Description

      when zookeeper starts up it will restore the most recent state (latest zxid) it finds in the data directory. unfortunately, in the quorum version of zookeeper updates are logged using an epoch based on the latest log file in a directory. if there is a snapshot with a higher epoch than the log files, the zookeeper server will start logging using an epoch one higher than the highest log file.

      so if a data directory has a snapshot with an epoch of 27 and there are no log files, zookeeper will start logging changes using epoch 1. if the cluster restarts the state will be restored from the snapshot with the epoch of 27, which in effect, restores old data.

      normal operation of zookeeper will never result in this situation.

      this does not effect standalone zookeeper.

      a fix should make sure to use an epoch one higher than the current state, whether it comes from the snapshot or log, and should include a sanity check to make sure that a follower never connects to a leader that has a lower epoch than its own.

      1. ZOOKEEPER-582_3.1.patch
        13 kB
        Mahadev konar
      2. ZOOKEEPER-582_3.2.patch
        14 kB
        Mahadev konar
      3. ZOOKEEPER-582.patch
        14 kB
        Mahadev konar
      4. ZOOKEEPER-582.patch
        14 kB
        Mahadev konar
      5. ZOOKEEPER-582.patch
        14 kB
        Mahadev konar
      6. ZOOKEEPER-582.patch
        14 kB
        Mahadev konar
      7. ZOOKEEPER-582_3.2.patch
        3 kB
        Mahadev konar
      8. ZOOKEEPER-582.patch
        3 kB
        Mahadev konar
      9. test.patch
        13 kB
        Benjamin Reed

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        5h 28m 3 Benjamin Reed 19/Nov/09 23:19
        Open Open Patch Available Patch Available
        2d 17h 12m 4 Mahadev konar 20/Nov/09 19:38
        Patch Available Patch Available Resolved Resolved
        2h 52m 1 Mahadev konar 20/Nov/09 22:30
        Resolved Resolved Closed Closed
        125d 18h 54m 1 Patrick Hunt 26/Mar/10 17:25
        Patrick Hunt made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Hudson added a comment -

        Integrated in ZooKeeper-trunk #546 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/546/)
        . ZooKeeper can revert to old data when a snapshot is created outside of normal processing (ben reed and mahadev via mahadev)

        Show
        Hudson added a comment - Integrated in ZooKeeper-trunk #546 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/546/ ) . ZooKeeper can revert to old data when a snapshot is created outside of normal processing (ben reed and mahadev via mahadev)
        Mahadev konar made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Release Note fixed bug in zookeeper that can lead zookeeper to revert to old data when a snapshot is created without a corresponding log.
        Resolution Fixed [ 1 ]
        Hide
        Mahadev konar added a comment -

        I just committed this to 3.1, 3.2 and trunk. thanks ben!

        Show
        Mahadev konar added a comment - I just committed this to 3.1, 3.2 and trunk. thanks ben!
        Benjamin Reed made changes -
        Hadoop Flags [Reviewed]
        Hide
        Benjamin Reed added a comment -

        +1 great job mahadev!

        Show
        Benjamin Reed added a comment - +1 great job mahadev!
        Mahadev konar made changes -
        Attachment ZOOKEEPER-582_3.1.patch [ 12425673 ]
        Hide
        Mahadev konar added a comment -

        looks like am really tired. this time i think its the correct file!

        Show
        Mahadev konar added a comment - looks like am really tired. this time i think its the correct file!
        Mahadev konar made changes -
        Attachment ZOOKEEPER-582_3.1.patch [ 12425672 ]
        Mahadev konar made changes -
        Attachment ZOOKEEPER-582_3.1.patch [ 12425672 ]
        Hide
        Mahadev konar added a comment -

        attached the wrong patch for 3.1 .. attaching again.

        Show
        Mahadev konar added a comment - attached the wrong patch for 3.1 .. attaching again.
        Mahadev konar made changes -
        Attachment ZOOKEEPER-582_3.1.patch [ 12425411 ]
        Mahadev konar made changes -
        Attachment ZOOKEEPER-582_3.1.patch [ 12425671 ]
        Mahadev konar made changes -
        Attachment ZOOKEEPER-582_3.1.patch [ 12425671 ]
        Hide
        Mahadev konar added a comment -

        latest patch for the 3.1 branch. I ran the tests and they pass on this branch as well.

        Show
        Mahadev konar added a comment - latest patch for the 3.1 branch. I ran the tests and they pass on this branch as well.
        Mahadev konar made changes -
        Attachment ZOOKEEPER-582_3.2.patch [ 12425670 ]
        Hide
        Mahadev konar added a comment -

        latest patch for 3.2 branch. I ran the tests and they pass.

        Show
        Mahadev konar added a comment - latest patch for 3.2 branch. I ran the tests and they pass.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12425653/ZOOKEEPER-582.patch
        against trunk revision 882313.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/71/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/71/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/71/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425653/ZOOKEEPER-582.patch against trunk revision 882313. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/71/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/71/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/71/console This message is automatically generated.
        Mahadev konar made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Mahadev konar made changes -
        Attachment ZOOKEEPER-582.patch [ 12425653 ]
        Hide
        Mahadev konar added a comment -

        addressed ben's comments in this patch.

        Show
        Mahadev konar added a comment - addressed ben's comments in this patch.
        Hide
        Mahadev konar added a comment -

        for 1) good catch.. i missed that

        for 2) good point....ill fix that ....

        Show
        Mahadev konar added a comment - for 1) good catch.. i missed that for 2) good point....ill fix that ....
        Benjamin Reed made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Benjamin Reed added a comment -

        looks good mahadev just two things:

        1) (minor) in getLastLoggedZxid() you should be useing maxLogZxid instead of calling getLastLoggedZxid() again.

        2) when doing the sanity check with the leaders zxid you should be checking epochs not zxids. it is possible for a follower to see something later and have to truncate from the same epoch, put a follower should never see a later epoch.

        Show
        Benjamin Reed added a comment - looks good mahadev just two things: 1) (minor) in getLastLoggedZxid() you should be useing maxLogZxid instead of calling getLastLoggedZxid() again. 2) when doing the sanity check with the leaders zxid you should be checking epochs not zxids. it is possible for a follower to see something later and have to truncate from the same epoch, put a follower should never see a later epoch.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12425538/ZOOKEEPER-582.patch
        against trunk revision 882313.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/69/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/69/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/69/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425538/ZOOKEEPER-582.patch against trunk revision 882313. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/69/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/69/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/69/console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12425536/ZOOKEEPER-582.patch
        against trunk revision 882313.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/68/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/68/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/68/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425536/ZOOKEEPER-582.patch against trunk revision 882313. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/68/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/68/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/68/console This message is automatically generated.
        Mahadev konar made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Mahadev konar made changes -
        Attachment ZOOKEEPER-582.patch [ 12425538 ]
        Hide
        Mahadev konar added a comment -

        looks like my eclipse settings added tabs to the indenation. fixed it in this patch.

        Show
        Mahadev konar added a comment - looks like my eclipse settings added tabs to the indenation. fixed it in this patch.
        Mahadev konar made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Mahadev konar made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Mahadev konar made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Mahadev konar made changes -
        Attachment ZOOKEEPER-582.patch [ 12425536 ]
        Hide
        Mahadev konar added a comment -

        this patch fixes the issue with FLE test. Ill upload the other patches for 3.1 and 3.2 as soon as hudson is done running this.

        Show
        Mahadev konar added a comment - this patch fixes the issue with FLE test. Ill upload the other patches for 3.1 and 3.2 as soon as hudson is done running this.
        Patrick Hunt made changes -
        Fix Version/s 3.3.0 [ 12313976 ]
        Patrick Hunt made changes -
        Assignee Mahadev konar [ mahadev ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12425503/ZOOKEEPER-582.patch
        against trunk revision 881882.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/67/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/67/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/67/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425503/ZOOKEEPER-582.patch against trunk revision 881882. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/67/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/67/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/67/console This message is automatically generated.
        Mahadev konar made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Mahadev konar made changes -
        Attachment ZOOKEEPER-582.patch [ 12425503 ]
        Hide
        Mahadev konar added a comment -

        this patch includes the patch and the test for trunk. ill upload combined patches for 3.1 and 3.2 branch.

        Show
        Mahadev konar added a comment - this patch includes the patch and the test for trunk. ill upload combined patches for 3.1 and 3.2 branch.
        Mahadev konar made changes -
        Attachment ZOOKEEPER-582_3.1.patch [ 12425411 ]
        Hide
        Mahadev konar added a comment -

        a patch for 3.1 branch.

        Show
        Mahadev konar added a comment - a patch for 3.1 branch.
        Mahadev konar made changes -
        Attachment ZOOKEEPER-582_3.2.patch [ 12425410 ]
        Hide
        Mahadev konar added a comment -

        a patch for 3.2 branch.

        Show
        Mahadev konar added a comment - a patch for 3.2 branch.
        Mahadev konar made changes -
        Attachment ZOOKEEPER-582.patch [ 12425320 ]
        Hide
        Mahadev konar added a comment -

        this patch fixes the issue. Ill test out the patch tomm.

        Show
        Mahadev konar added a comment - this patch fixes the issue. Ill test out the patch tomm.
        Benjamin Reed made changes -
        Attachment test.patch [ 12425316 ]
        Hide
        Benjamin Reed added a comment -

        this patch reproduces the problems outlined in this issue.

        Show
        Benjamin Reed added a comment - this patch reproduces the problems outlined in this issue.
        Hide
        Patrick Hunt added a comment -

        As Ben mentioned we will never see this situation during normal operation of ZK.

        The case where we did see this was a result of a user running the migration tool that we provide to upgrade from version 2 to version 3 of ZooKeeper. The tool migrates the data by writing a single snapshot file where the zxid is maintained (it does not write a log file). As a result of the scenario Ben mentioned (snap with no associated log file) this could cause this bug to occur. If you have run the migration tool, documented here:
        http://hadoop.apache.org/zookeeper/docs/r3.0.0/releasenotes.html#migration_data
        you can verify whether or not you have this situation by looking at your ZooKeeper datadirectory

        Here's an example

        rw-rr- 1 root search 67108880 Nov 17 19:31 log.300022b61
        rw-rr- 1 root search 67108880 Nov 17 19:38 log.3000292d0
        rw-rr- 1 root search 3646608 Nov 5 12:13 snapshot.1db5df6e2d6
        rw-rr- 1 root search 3616579 Nov 17 19:31 snapshot.3000292c9
        rw-rr- 1 root search 3616708 Nov 17 19:38 snapshot.300038d32

        where the files are of the form <file>.<epoch><xid>
        epoch and xid both being 4 byte values represented as hex

        Notice that the snapshot.1db5df6e2d6 has epoch of 0x1db, while the other
        files have epoch of 0x3, this is the scenario described in the description of this
        JIRA. (there is no log file associated with epoch 0x1db)

        If you see this in your datadir - a snapshot with an epoch where there are no log files with
        this same epoch, then this bug pertains. If you see snapshots of a particular epoch
        and log files with the same epoch then this bug does NOT pertain.

        Show
        Patrick Hunt added a comment - As Ben mentioned we will never see this situation during normal operation of ZK. The case where we did see this was a result of a user running the migration tool that we provide to upgrade from version 2 to version 3 of ZooKeeper. The tool migrates the data by writing a single snapshot file where the zxid is maintained (it does not write a log file). As a result of the scenario Ben mentioned (snap with no associated log file) this could cause this bug to occur. If you have run the migration tool, documented here: http://hadoop.apache.org/zookeeper/docs/r3.0.0/releasenotes.html#migration_data you can verify whether or not you have this situation by looking at your ZooKeeper datadirectory Here's an example rw-r r - 1 root search 67108880 Nov 17 19:31 log.300022b61 rw-r r - 1 root search 67108880 Nov 17 19:38 log.3000292d0 rw-r r - 1 root search 3646608 Nov 5 12:13 snapshot.1db5df6e2d6 rw-r r - 1 root search 3616579 Nov 17 19:31 snapshot.3000292c9 rw-r r - 1 root search 3616708 Nov 17 19:38 snapshot.300038d32 where the files are of the form <file>.<epoch><xid> epoch and xid both being 4 byte values represented as hex Notice that the snapshot.1db5df6e2d6 has epoch of 0x1db, while the other files have epoch of 0x3, this is the scenario described in the description of this JIRA. (there is no log file associated with epoch 0x1db) If you see this in your datadir - a snapshot with an epoch where there are no log files with this same epoch, then this bug pertains. If you see snapshots of a particular epoch and log files with the same epoch then this bug does NOT pertain.
        Patrick Hunt made changes -
        Field Original Value New Value
        Fix Version/s 3.1.2 [ 12314394 ]
        Affects Version/s 3.1.1 [ 12313649 ]
        Priority Major [ 3 ] Blocker [ 1 ]
        Benjamin Reed created issue -

          People

          • Assignee:
            Mahadev konar
            Reporter:
            Benjamin Reed
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development