Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.4.0
    • Fix Version/s: 3.4.1, 3.5.0
    • Component/s: server
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      From the mailing list:

      FileTxnSnapLog.restore contains a code block handling a NODEEXISTS failure during deserialization. The problem is explained there in a code comment. The code block however is only executed for a CREATE txn, not for a multiTxn containing a CREATE.

      Even if the mentioned code block would also be executed for multi transactions, it needs adaption for multi transactions. What, if after the first failed transaction in a multi txn during deserialization, there would be subsequent transactions in the same multi that would also have failed?
      We don't know, since the first failed transaction hides the information about the remaining transactions.

      1. ZOOKEEPER-1269.patch
        8 kB
        Camille Fournier

        Activity

        Hide
        Hudson added a comment -

        Integrated in ZooKeeper-trunk #1392 (See https://builds.apache.org/job/ZooKeeper-trunk/1392/)
        ZOOKEEPER-1269. Multi deserialization issues. (Camille Fournier via mahadev)

        mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212663
        Files :

        • /zookeeper/trunk/CHANGES.txt
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/DataTree.java
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java
        • /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java
        Show
        Hudson added a comment - Integrated in ZooKeeper-trunk #1392 (See https://builds.apache.org/job/ZooKeeper-trunk/1392/ ) ZOOKEEPER-1269 . Multi deserialization issues. (Camille Fournier via mahadev) mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212663 Files : /zookeeper/trunk/CHANGES.txt /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/DataTree.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/LoadFromLogTest.java
        Hide
        Mahadev konar added a comment -

        I just committed this. Thanks Camille and Ben!

        Show
        Mahadev konar added a comment - I just committed this. Thanks Camille and Ben!
        Hide
        Benjamin Reed added a comment -

        +1 excellent work camille!

        Show
        Benjamin Reed added a comment - +1 excellent work camille!
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12501375/ZOOKEEPER-1269.patch
        against trunk revision 1208979.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/815//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/815//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/815//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501375/ZOOKEEPER-1269.patch against trunk revision 1208979. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/815//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/815//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/815//console This message is automatically generated.
        Hide
        Marshall McMullen added a comment -

        I agree with Camille. I looked the code over a while back and from a Multi perspective looks correct.

        Show
        Marshall McMullen added a comment - I agree with Camille. I looked the code over a while back and from a Multi perspective looks correct.
        Hide
        Camille Fournier added a comment -

        This is a reasonably big bug to just leave outstanding for this long, can someone please review this and check it in?

        Show
        Camille Fournier added a comment - This is a reasonably big bug to just leave outstanding for this long, can someone please review this and check it in?
        Hide
        Camille Fournier added a comment -

        I think it should go into both, since it is a bug with multi.

        Show
        Camille Fournier added a comment - I think it should go into both, since it is a bug with multi.
        Hide
        Mahadev konar added a comment -

        Camille, Should this go into 3.4 or just trunk?

        Show
        Mahadev konar added a comment - Camille, Should this go into 3.4 or just trunk?
        Hide
        Camille Fournier added a comment -

        Hey guys, someone want to review and commit this? Looks like we got the OK from the multi folks.

        Show
        Camille Fournier added a comment - Hey guys, someone want to review and commit this? Looks like we got the OK from the multi folks.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12501375/ZOOKEEPER-1269.patch
        against trunk revision 1190450.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/740//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/740//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/740//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501375/ZOOKEEPER-1269.patch against trunk revision 1190450. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/740//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/740//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/740//console This message is automatically generated.
        Hide
        Marshall McMullen added a comment -

        I agree with Benjamin's comment. In the PrepRequestProcessor we only apply the multi-op if we know all the contained ops will succeed. If any of them fail, the whole thing gets thrown away and we rollback. So there should be no multi-op specific handling that needs to be done in your patch. I looked over the patch and I don't see any problems.

        Show
        Marshall McMullen added a comment - I agree with Benjamin's comment. In the PrepRequestProcessor we only apply the multi-op if we know all the contained ops will succeed. If any of them fail, the whole thing gets thrown away and we rollback. So there should be no multi-op specific handling that needs to be done in your patch. I looked over the patch and I don't see any problems.
        Hide
        Camille Fournier added a comment -

        Right, ok. So I think the patch attached to this issue does exactly that, if someone would like to review it. What I'm not sure is whether the test I put in is particularly good, so would really appreciate one of the multi experts taking a gander there.

        Show
        Camille Fournier added a comment - Right, ok. So I think the patch attached to this issue does exactly that, if someone would like to review it. What I'm not sure is whether the test I put in is particularly good, so would really appreciate one of the multi experts taking a gander there.
        Hide
        Benjamin Reed added a comment -

        we convert multi to idempotent transactions. by nature if an operation is a multi-op fails, there will not be any transaction there at all, so we should process the transactions resulting from multi-op in exactly the same way as normal transactions. the only difference is that we have to make sure that if one of the transactions get processed, they all do.

        Show
        Benjamin Reed added a comment - we convert multi to idempotent transactions. by nature if an operation is a multi-op fails, there will not be any transaction there at all, so we should process the transactions resulting from multi-op in exactly the same way as normal transactions. the only difference is that we have to make sure that if one of the transactions get processed, they all do.
        Hide
        Camille Fournier added a comment -

        Ah yes, being in the log is enough for it to be true were snapshots taken in a frozen system state. But since they are not, you can have these operations fail in playback due to concurrency issues. Multi isn't a special case above the other zk ops, they all have this potential race.

        Show
        Camille Fournier added a comment - Ah yes, being in the log is enough for it to be true were snapshots taken in a frozen system state. But since they are not, you can have these operations fail in playback due to concurrency issues. Multi isn't a special case above the other zk ops, they all have this potential race.
        Hide
        Ted Dunning added a comment -

        I am pretty sure that it SHOULD be true. Whether it is actually true is another matter. Being in the log is evidence that there were no reasons for the update to fail if the data store is actually working correctly.

        Again, the assumption of correct operation at this scale may be too strong.

        Show
        Ted Dunning added a comment - I am pretty sure that it SHOULD be true. Whether it is actually true is another matter. Being in the log is evidence that there were no reasons for the update to fail if the data store is actually working correctly. Again, the assumption of correct operation at this scale may be too strong.
        Hide
        Camille Fournier added a comment -
        Show
        Camille Fournier added a comment - Are you sure about that given https://issues.apache.org/jira/browse/ZOOKEEPER-1046?
        Hide
        Ted Dunning added a comment -

        Once a multi makes it to the log, it is pretty much guaranteed that all of the elements of the multi will succeed or log replay will totally fail.

        It may be a bit too cowboy to assume that this is really true, but the point of the idempotency property is to make this true.

        Show
        Ted Dunning added a comment - Once a multi makes it to the log, it is pretty much guaranteed that all of the elements of the multi will succeed or log replay will totally fail. It may be a bit too cowboy to assume that this is really true, but the point of the idempotency property is to make this true.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12501375/ZOOKEEPER-1269.patch
        against trunk revision 1190450.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 9 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/726//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/726//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/726//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501375/ZOOKEEPER-1269.patch against trunk revision 1190450. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/726//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/726//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/726//console This message is automatically generated.
        Hide
        Camille Fournier added a comment -

        The test here is a little bit vague, because I don't really understand how a "proper" but broken multitxn would look. Handling the error codes in FileTxnSnapLog is also a bit fuzzy. But I think the general refactor should fix the issue. Would be great if Marshall could take a look at this to verify.

        Show
        Camille Fournier added a comment - The test here is a little bit vague, because I don't really understand how a "proper" but broken multitxn would look. Handling the error codes in FileTxnSnapLog is also a bit fuzzy. But I think the general refactor should fix the issue. Would be great if Marshall could take a look at this to verify.

          People

          • Assignee:
            Camille Fournier
            Reporter:
            Camille Fournier
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development