ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-582

ZooKeeper can revert to old data when a snapshot is created outside of normal processing

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.1.1, 3.2.1
    • Fix Version/s: 3.1.2, 3.2.2, 3.3.0
    • Component/s: server
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      fixed bug in zookeeper that can lead zookeeper to revert to old data when a snapshot is created without a corresponding log.

      Description

      when zookeeper starts up it will restore the most recent state (latest zxid) it finds in the data directory. unfortunately, in the quorum version of zookeeper updates are logged using an epoch based on the latest log file in a directory. if there is a snapshot with a higher epoch than the log files, the zookeeper server will start logging using an epoch one higher than the highest log file.

      so if a data directory has a snapshot with an epoch of 27 and there are no log files, zookeeper will start logging changes using epoch 1. if the cluster restarts the state will be restored from the snapshot with the epoch of 27, which in effect, restores old data.

      normal operation of zookeeper will never result in this situation.

      this does not effect standalone zookeeper.

      a fix should make sure to use an epoch one higher than the current state, whether it comes from the snapshot or log, and should include a sanity check to make sure that a follower never connects to a leader that has a lower epoch than its own.

      1. ZOOKEEPER-582.patch
        3 kB
        Mahadev konar
      2. ZOOKEEPER-582.patch
        14 kB
        Mahadev konar
      3. ZOOKEEPER-582.patch
        14 kB
        Mahadev konar
      4. ZOOKEEPER-582.patch
        14 kB
        Mahadev konar
      5. ZOOKEEPER-582.patch
        14 kB
        Mahadev konar
      6. ZOOKEEPER-582_3.2.patch
        3 kB
        Mahadev konar
      7. ZOOKEEPER-582_3.2.patch
        14 kB
        Mahadev konar
      8. ZOOKEEPER-582_3.1.patch
        13 kB
        Mahadev konar
      9. test.patch
        13 kB
        Benjamin Reed

        Activity

        Hide
        Patrick Hunt added a comment -

        As Ben mentioned we will never see this situation during normal operation of ZK.

        The case where we did see this was a result of a user running the migration tool that we provide to upgrade from version 2 to version 3 of ZooKeeper. The tool migrates the data by writing a single snapshot file where the zxid is maintained (it does not write a log file). As a result of the scenario Ben mentioned (snap with no associated log file) this could cause this bug to occur. If you have run the migration tool, documented here:
        http://hadoop.apache.org/zookeeper/docs/r3.0.0/releasenotes.html#migration_data
        you can verify whether or not you have this situation by looking at your ZooKeeper datadirectory

        Here's an example

        rw-rr- 1 root search 67108880 Nov 17 19:31 log.300022b61
        rw-rr- 1 root search 67108880 Nov 17 19:38 log.3000292d0
        rw-rr- 1 root search 3646608 Nov 5 12:13 snapshot.1db5df6e2d6
        rw-rr- 1 root search 3616579 Nov 17 19:31 snapshot.3000292c9
        rw-rr- 1 root search 3616708 Nov 17 19:38 snapshot.300038d32

        where the files are of the form <file>.<epoch><xid>
        epoch and xid both being 4 byte values represented as hex

        Notice that the snapshot.1db5df6e2d6 has epoch of 0x1db, while the other
        files have epoch of 0x3, this is the scenario described in the description of this
        JIRA. (there is no log file associated with epoch 0x1db)

        If you see this in your datadir - a snapshot with an epoch where there are no log files with
        this same epoch, then this bug pertains. If you see snapshots of a particular epoch
        and log files with the same epoch then this bug does NOT pertain.

        Show
        Patrick Hunt added a comment - As Ben mentioned we will never see this situation during normal operation of ZK. The case where we did see this was a result of a user running the migration tool that we provide to upgrade from version 2 to version 3 of ZooKeeper. The tool migrates the data by writing a single snapshot file where the zxid is maintained (it does not write a log file). As a result of the scenario Ben mentioned (snap with no associated log file) this could cause this bug to occur. If you have run the migration tool, documented here: http://hadoop.apache.org/zookeeper/docs/r3.0.0/releasenotes.html#migration_data you can verify whether or not you have this situation by looking at your ZooKeeper datadirectory Here's an example rw-r r - 1 root search 67108880 Nov 17 19:31 log.300022b61 rw-r r - 1 root search 67108880 Nov 17 19:38 log.3000292d0 rw-r r - 1 root search 3646608 Nov 5 12:13 snapshot.1db5df6e2d6 rw-r r - 1 root search 3616579 Nov 17 19:31 snapshot.3000292c9 rw-r r - 1 root search 3616708 Nov 17 19:38 snapshot.300038d32 where the files are of the form <file>.<epoch><xid> epoch and xid both being 4 byte values represented as hex Notice that the snapshot.1db5df6e2d6 has epoch of 0x1db, while the other files have epoch of 0x3, this is the scenario described in the description of this JIRA. (there is no log file associated with epoch 0x1db) If you see this in your datadir - a snapshot with an epoch where there are no log files with this same epoch, then this bug pertains. If you see snapshots of a particular epoch and log files with the same epoch then this bug does NOT pertain.
        Hide
        Benjamin Reed added a comment -

        this patch reproduces the problems outlined in this issue.

        Show
        Benjamin Reed added a comment - this patch reproduces the problems outlined in this issue.
        Hide
        Mahadev konar added a comment -

        this patch fixes the issue. Ill test out the patch tomm.

        Show
        Mahadev konar added a comment - this patch fixes the issue. Ill test out the patch tomm.
        Hide
        Mahadev konar added a comment -

        a patch for 3.2 branch.

        Show
        Mahadev konar added a comment - a patch for 3.2 branch.
        Hide
        Mahadev konar added a comment -

        a patch for 3.1 branch.

        Show
        Mahadev konar added a comment - a patch for 3.1 branch.
        Hide
        Mahadev konar added a comment -

        this patch includes the patch and the test for trunk. ill upload combined patches for 3.1 and 3.2 branch.

        Show
        Mahadev konar added a comment - this patch includes the patch and the test for trunk. ill upload combined patches for 3.1 and 3.2 branch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12425503/ZOOKEEPER-582.patch
        against trunk revision 881882.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/67/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/67/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/67/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425503/ZOOKEEPER-582.patch against trunk revision 881882. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/67/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/67/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/67/console This message is automatically generated.
        Hide
        Mahadev konar added a comment -

        this patch fixes the issue with FLE test. Ill upload the other patches for 3.1 and 3.2 as soon as hudson is done running this.

        Show
        Mahadev konar added a comment - this patch fixes the issue with FLE test. Ill upload the other patches for 3.1 and 3.2 as soon as hudson is done running this.
        Hide
        Mahadev konar added a comment -

        looks like my eclipse settings added tabs to the indenation. fixed it in this patch.

        Show
        Mahadev konar added a comment - looks like my eclipse settings added tabs to the indenation. fixed it in this patch.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12425536/ZOOKEEPER-582.patch
        against trunk revision 882313.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/68/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/68/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/68/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425536/ZOOKEEPER-582.patch against trunk revision 882313. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/68/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/68/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/68/console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12425538/ZOOKEEPER-582.patch
        against trunk revision 882313.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/69/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/69/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/69/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425538/ZOOKEEPER-582.patch against trunk revision 882313. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/69/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/69/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/69/console This message is automatically generated.
        Hide
        Benjamin Reed added a comment -

        looks good mahadev just two things:

        1) (minor) in getLastLoggedZxid() you should be useing maxLogZxid instead of calling getLastLoggedZxid() again.

        2) when doing the sanity check with the leaders zxid you should be checking epochs not zxids. it is possible for a follower to see something later and have to truncate from the same epoch, put a follower should never see a later epoch.

        Show
        Benjamin Reed added a comment - looks good mahadev just two things: 1) (minor) in getLastLoggedZxid() you should be useing maxLogZxid instead of calling getLastLoggedZxid() again. 2) when doing the sanity check with the leaders zxid you should be checking epochs not zxids. it is possible for a follower to see something later and have to truncate from the same epoch, put a follower should never see a later epoch.
        Hide
        Mahadev konar added a comment -

        for 1) good catch.. i missed that

        for 2) good point....ill fix that ....

        Show
        Mahadev konar added a comment - for 1) good catch.. i missed that for 2) good point....ill fix that ....
        Hide
        Mahadev konar added a comment -

        addressed ben's comments in this patch.

        Show
        Mahadev konar added a comment - addressed ben's comments in this patch.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12425653/ZOOKEEPER-582.patch
        against trunk revision 882313.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/71/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/71/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/71/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425653/ZOOKEEPER-582.patch against trunk revision 882313. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/71/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/71/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h8.grid.sp2.yahoo.net/71/console This message is automatically generated.
        Hide
        Mahadev konar added a comment -

        latest patch for 3.2 branch. I ran the tests and they pass.

        Show
        Mahadev konar added a comment - latest patch for 3.2 branch. I ran the tests and they pass.
        Hide
        Mahadev konar added a comment -

        latest patch for the 3.1 branch. I ran the tests and they pass on this branch as well.

        Show
        Mahadev konar added a comment - latest patch for the 3.1 branch. I ran the tests and they pass on this branch as well.
        Hide
        Mahadev konar added a comment -

        attached the wrong patch for 3.1 .. attaching again.

        Show
        Mahadev konar added a comment - attached the wrong patch for 3.1 .. attaching again.
        Hide
        Mahadev konar added a comment -

        looks like am really tired. this time i think its the correct file!

        Show
        Mahadev konar added a comment - looks like am really tired. this time i think its the correct file!
        Hide
        Benjamin Reed added a comment -

        +1 great job mahadev!

        Show
        Benjamin Reed added a comment - +1 great job mahadev!
        Hide
        Mahadev konar added a comment -

        I just committed this to 3.1, 3.2 and trunk. thanks ben!

        Show
        Mahadev konar added a comment - I just committed this to 3.1, 3.2 and trunk. thanks ben!
        Hide
        Hudson added a comment -

        Integrated in ZooKeeper-trunk #546 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/546/)
        . ZooKeeper can revert to old data when a snapshot is created outside of normal processing (ben reed and mahadev via mahadev)

        Show
        Hudson added a comment - Integrated in ZooKeeper-trunk #546 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/546/ ) . ZooKeeper can revert to old data when a snapshot is created outside of normal processing (ben reed and mahadev via mahadev)

          People

          • Assignee:
            Mahadev konar
            Reporter:
            Benjamin Reed
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development