ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1624

PrepRequestProcessor abort multi-operation incorrectly

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.4.6, 3.5.0
    • Component/s: server
    • Labels:

      Description

      We found this issue when trying to issue multiple instances of the following multi-op concurrently

      multi

      { 1. create sequential node /a- 2. create node /b }

      The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist

      However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible.

      Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously

      1. ZOK, ZOK
      2. ZOK, ZNODEEXISTS,
      3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY,

      When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor

      1. create /a-0001
      2. create /a-0003
      3. create /a-0001

      The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node ("/"). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op

      The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not.

      1. ZOOKEEPER-1624-3.4
        4 kB
        Camille Fournier
      2. ZOOKEEPER-1624.patch
        10 kB
        Thawan Kooburat
      3. ZOOKEEPER-1624.patch
        10 kB
        Thawan Kooburat
      4. ZOOKEEPER-1624.patch
        4 kB
        Thawan Kooburat
      5. ZOOKEEPER-1624.patch
        4 kB
        Thawan Kooburat
      6. ZOOKEEPER-1624.patch
        4 kB
        Thawan Kooburat

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        36m 1s 1 Thawan Kooburat 18/Jan/13 03:47
        Patch Available Patch Available Resolved Resolved
        265d 15h 18m 1 Camille Fournier 10/Oct/13 20:06
        Resolved Resolved Closed Closed
        153d 23h 10m 1 Flavio Junqueira 13/Mar/14 18:16
        Flavio Junqueira made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Flavio Junqueira added a comment -

        Closing issues after releasing 3.4.6.

        Show
        Flavio Junqueira added a comment - Closing issues after releasing 3.4.6.
        Camille Fournier made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in ZooKeeper-trunk #2085 (See https://builds.apache.org/job/ZooKeeper-trunk/2085/)
        ZOOKEEPER-1624. PrepRequestProcessor abort multi-operation incorrectly. (thawan via camille) (camille: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1531061)

        • /zookeeper/trunk/CHANGES.txt
        • /zookeeper/trunk/src/c/tests/TestMulti.cc
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java
        • /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/MultiAsyncTransactionTest.java
        Show
        Hudson added a comment - SUCCESS: Integrated in ZooKeeper-trunk #2085 (See https://builds.apache.org/job/ZooKeeper-trunk/2085/ ) ZOOKEEPER-1624 . PrepRequestProcessor abort multi-operation incorrectly. (thawan via camille) (camille: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1531061 ) /zookeeper/trunk/CHANGES.txt /zookeeper/trunk/src/c/tests/TestMulti.cc /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/MultiAsyncTransactionTest.java
        Hide
        Flavio Junqueira added a comment -

        No objection on my end.

        Show
        Flavio Junqueira added a comment - No objection on my end.
        Hide
        Camille Fournier added a comment -

        This patch just makes it apply to 3.4 and removes the Java test. If there are no objections I'll check it in.

        Show
        Camille Fournier added a comment - This patch just makes it apply to 3.4 and removes the Java test. If there are no objections I'll check it in.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12607473/ZOOKEEPER-1624-3.4
        against trunk revision 1530166.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1662//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12607473/ZOOKEEPER-1624-3.4 against trunk revision 1530166. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1662//console This message is automatically generated.
        Camille Fournier made changes -
        Attachment ZOOKEEPER-1624-3.4 [ 12607473 ]
        Hide
        Camille Fournier added a comment -

        patch for 3.4

        Show
        Camille Fournier added a comment - patch for 3.4
        Hide
        Flavio Junqueira added a comment -

        Given the recent set of comments, I'm not sure it matters, but my point was simply that adding async calls for a feature that already exists is not really adding a whole new feature, just extending the scope of an existing one. For me, the feature is multi. But fine if we can make progress without it.

        Show
        Flavio Junqueira added a comment - Given the recent set of comments, I'm not sure it matters, but my point was simply that adding async calls for a feature that already exists is not really adding a whole new feature, just extending the scope of an existing one. For me, the feature is multi. But fine if we can make progress without it.
        Hide
        Patrick Hunt added a comment -

        it is not really a whole new feature

        fwiw a public facing client API change that adds async support to multi is a feature imo.

        Show
        Patrick Hunt added a comment - it is not really a whole new feature fwiw a public facing client API change that adds async support to multi is a feature imo.
        Hide
        Camille Fournier added a comment -

        I'm comfortable with pushing the patch for 3.4 without the Java test.

        Show
        Camille Fournier added a comment - I'm comfortable with pushing the patch for 3.4 without the Java test.
        Hide
        Thawan Kooburat added a comment -

        As I already comment earlier, the current Java test doesn't actually catch the bug due to timing issue. I guess, I will have to rewrite it to test PrepRequestProcessor directly (which is probably not going to rely on ZOOKEEPER-1572)

        If you want to commit this now, the patch itself has a proper and reliable (at least on my box) unit test in C. Our test infrastructure do run C unit test and report the result right? I agree with Camile that it would be nice to have Java test for server-side functionality but it isn't strictly needed right?

        Show
        Thawan Kooburat added a comment - As I already comment earlier, the current Java test doesn't actually catch the bug due to timing issue. I guess, I will have to rewrite it to test PrepRequestProcessor directly (which is probably not going to rely on ZOOKEEPER-1572 ) If you want to commit this now, the patch itself has a proper and reliable (at least on my box) unit test in C. Our test infrastructure do run C unit test and report the result right? I agree with Camile that it would be nice to have Java test for server-side functionality but it isn't strictly needed right?
        Hide
        Flavio Junqueira added a comment -

        I understand that we typically don't add new features to bug fix releases, but I don't really see a problem with having ZOOKEEPER-1572 into b3.4, it is not really a whole new feature.

        Show
        Flavio Junqueira added a comment - I understand that we typically don't add new features to bug fix releases, but I don't really see a problem with having ZOOKEEPER-1572 into b3.4, it is not really a whole new feature.
        Hide
        Patrick Hunt added a comment -

        Can a test not be created that meets the criteria?

        Show
        Patrick Hunt added a comment - Can a test not be created that meets the criteria?
        Hide
        Camille Fournier added a comment -

        Right. So should we push the patch without the Java test in 3.4.6?

        Show
        Camille Fournier added a comment - Right. So should we push the patch without the Java test in 3.4.6?
        Hide
        Patrick Hunt added a comment -

        We typically don't add features in fix releases.

        Show
        Patrick Hunt added a comment - We typically don't add features in fix releases.
        Hide
        Camille Fournier added a comment -

        The problem is that ZOOKEEPER-1572 is necessary for the test he wrote for this. Do we want to push that into 3.4.6?

        Show
        Camille Fournier added a comment - The problem is that ZOOKEEPER-1572 is necessary for the test he wrote for this. Do we want to push that into 3.4.6?
        Hide
        Flavio Junqueira added a comment -

        Thawan Kooburat, would you have time to generate a patch for 3.4.6?

        Show
        Flavio Junqueira added a comment - Thawan Kooburat , would you have time to generate a patch for 3.4.6?
        Hide
        Flavio Junqueira added a comment -

        I'd like to have this one in for 3.4.6 if possible.

        Show
        Flavio Junqueira added a comment - I'd like to have this one in for 3.4.6 if possible.
        Hide
        Thawan Kooburat added a comment -

        For Java, I think the best approach is to unit test the PrepRequestProcessor itself directly (and mock the rest of system). With this approach, we won't need async interface. I haven't have time to rewrite the test yet but if we want this in 3.4.6, I will find some time to work on it after I am done with ZK-1551,1552

        Show
        Thawan Kooburat added a comment - For Java, I think the best approach is to unit test the PrepRequestProcessor itself directly (and mock the rest of system). With this approach, we won't need async interface. I haven't have time to rewrite the test yet but if we want this in 3.4.6, I will find some time to work on it after I am done with ZK-1551,1552
        Hide
        Flavio Junqueira added a comment -

        @thawan, Could you give me some feedback here as well, please?

        As for the Java test, I was thinking that to make a multi-op transaction fail reliably, you could use check() with a znode, version pair that doesn't match, which will cause the transaction to fail. If the transaction also includes the creation of a sequential znode, then you should be able to trigger this bug, no?

        Show
        Flavio Junqueira added a comment - @thawan, Could you give me some feedback here as well, please? As for the Java test, I was thinking that to make a multi-op transaction fail reliably, you could use check() with a znode, version pair that doesn't match, which will cause the transaction to fail. If the transaction also includes the creation of a sequential znode, then you should be able to trigger this bug, no?
        Hide
        Flavio Junqueira added a comment -

        I was wondering about what to do with this jira. We need to work on the java test case, and on top of it decide what to do for the 3.4 branch. I believe that we didn't check in ZOOKEEPER-1572 to the 3.4. branch because Mahadev konar said that we don't check in new features into an ongoing branch. Well, in this case, I'd say we should so that we can cleanly apply this patch, unless we come up with a way of testing that does not rely on the async multi api.

        On the java test case, Thawan Kooburat says that it doesn't pass, but the last run on jenkins returned +1 for the core unit tests. Did you mean to say that it doesn't pass reliably?

        Could you people give me some feedback here, please, Thawan Kooburat, Camille Fournier?

        Show
        Flavio Junqueira added a comment - I was wondering about what to do with this jira. We need to work on the java test case, and on top of it decide what to do for the 3.4 branch. I believe that we didn't check in ZOOKEEPER-1572 to the 3.4. branch because Mahadev konar said that we don't check in new features into an ongoing branch. Well, in this case, I'd say we should so that we can cleanly apply this patch, unless we come up with a way of testing that does not rely on the async multi api. On the java test case, Thawan Kooburat says that it doesn't pass, but the last run on jenkins returned +1 for the core unit tests. Did you mean to say that it doesn't pass reliably? Could you people give me some feedback here, please, Thawan Kooburat , Camille Fournier ?
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12570079/ZOOKEEPER-1624.patch
        against trunk revision 1447982.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1402//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1402//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1402//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570079/ZOOKEEPER-1624.patch against trunk revision 1447982. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1402//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1402//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1402//console This message is automatically generated.
        Thawan Kooburat made changes -
        Attachment ZOOKEEPER-1624.patch [ 12570079 ]
        Hide
        Thawan Kooburat added a comment -

        This is a direct port of c-client test. However, I found that it cannot detect the bug because of a timing issue. I think it is because both server and client are in the same process for Java unit test, each multi request will get committed before the next one arrive, so the bug won't occur.

        On the other hand, with c-client test, the bug is always reproducible in my box.

        If anyone can help me on Java test I will be appreciated.

        Show
        Thawan Kooburat added a comment - This is a direct port of c-client test. However, I found that it cannot detect the bug because of a timing issue. I think it is because both server and client are in the same process for Java unit test, each multi request will get committed before the next one arrive, so the bug won't occur. On the other hand, with c-client test, the bug is always reproducible in my box. If anyone can help me on Java test I will be appreciated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12570055/ZOOKEEPER-1624.patch
        against trunk revision 1447982.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1400//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1400//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1400//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570055/ZOOKEEPER-1624.patch against trunk revision 1447982. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1400//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1400//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1400//console This message is automatically generated.
        Thawan Kooburat made changes -
        Attachment ZOOKEEPER-1624.patch [ 12570055 ]
        Hide
        Thawan Kooburat added a comment -

        Oops uploaded the wrong file. This one should have it

        Show
        Thawan Kooburat added a comment - Oops uploaded the wrong file. This one should have it
        Hide
        Camille Fournier added a comment -

        This new patch still doesn't seem to have the Java test.

        Show
        Camille Fournier added a comment - This new patch still doesn't seem to have the Java test.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12570018/ZOOKEEPER-1624.patch
        against trunk revision 1447621.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1398//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1398//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1398//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570018/ZOOKEEPER-1624.patch against trunk revision 1447621. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1398//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1398//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1398//console This message is automatically generated.
        Thawan Kooburat made changes -
        Attachment ZOOKEEPER-1624.patch [ 12570018 ]
        Hide
        Thawan Kooburat added a comment -

        A version of with Java unit test.

        Since this version relies on ZOOKEEPER-1572, it can only be applied to trunk. For 3.4.6, we have to use the previous version.

        Show
        Thawan Kooburat added a comment - A version of with Java unit test. Since this version relies on ZOOKEEPER-1572 , it can only be applied to trunk. For 3.4.6, we have to use the previous version.
        Hide
        Camille Fournier added a comment -

        Can we get a test for this in Java now that ZOOKEEPER-1572 has been committed? I can't get the cppunit build to run and I'm uncomfortable approving a patch where I can't see the tests fail and pass.

        Show
        Camille Fournier added a comment - Can we get a test for this in Java now that ZOOKEEPER-1572 has been committed? I can't get the cppunit build to run and I'm uncomfortable approving a patch where I can't see the tests fail and pass.
        Patrick Hunt made changes -
        Fix Version/s 3.4.6 [ 12323310 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12567638/ZOOKEEPER-1624.patch
        against trunk revision 1438375.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        -1 release audit. The applied patch generated 26 release audit warnings (more than the trunk's current 24 warnings).

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1375//testReport/
        Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1375//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1375//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1375//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567638/ZOOKEEPER-1624.patch against trunk revision 1438375. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 26 release audit warnings (more than the trunk's current 24 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1375//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1375//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1375//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1375//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        The fix would be backported to 3.4, right ?

        Show
        Ted Yu added a comment - The fix would be backported to 3.4, right ?
        Hide
        Camille Fournier added a comment -

        Oh that's great Thawan, I will see about getting 1572 committed so we can get an easy Java-side test for this bug.

        Show
        Camille Fournier added a comment - Oh that's great Thawan, I will see about getting 1572 committed so we can get an easy Java-side test for this bug.
        Hide
        Thawan Kooburat added a comment -

        It should be simple to replicate if ZOOKEEPER-1572 is committed to trunk. We just need async functionality to test this.

        Show
        Thawan Kooburat added a comment - It should be simple to replicate if ZOOKEEPER-1572 is committed to trunk. We just need async functionality to test this.
        Hide
        Camille Fournier added a comment -

        I started looking at this. I am very unhappy that we have to write a C client test to reveal a bug in our Java core. I'm working on writing a java test to replicate because we really should not have to rely on tests that should only be for the client side code to test our server.

        Show
        Camille Fournier added a comment - I started looking at this. I am very unhappy that we have to write a C client test to reveal a bug in our Java core. I'm working on writing a java test to replicate because we really should not have to rely on tests that should only be for the client side code to test our server.
        Thawan Kooburat made changes -
        Attachment ZOOKEEPER-1624.patch [ 12567638 ]
        Hide
        Thawan Kooburat added a comment -

        Clean up wait counter so that future test can run correctly

        Show
        Thawan Kooburat added a comment - Clean up wait counter so that future test can run correctly
        Hide
        Thawan Kooburat added a comment -

        Any multi-op expert can help on initial feedback?

        Show
        Thawan Kooburat added a comment - Any multi-op expert can help on initial feedback?
        Mahadev konar made changes -
        Labels review zk-review
        Mahadev konar made changes -
        Labels review
        Mahadev konar made changes -
        Fix Version/s 3.5.0 [ 12316644 ]
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12565429/ZOOKEEPER-1624.patch
        against trunk revision 1434978.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1343//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1343//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1343//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12565429/ZOOKEEPER-1624.patch against trunk revision 1434978. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1343//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1343//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1343//console This message is automatically generated.
        Thawan Kooburat made changes -
        Description We found this issue when trying to issue multiple instances of the following multi-op concurrently

        multi {
        1. create sequential node /a-
        2. create node /b
        }

        The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist

        However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible.

        Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously

        1. ZOK, ZOK
        2. ZOK, ZNODEEXISTS,
        3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY,

        When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor

        1. create /a-0001
        2. create /a-0003
        3. create /a-0001

        The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node (/). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op

        The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not.
        We found this issue when trying to issue multiple instances of the following multi-op concurrently

        multi {
        1. create sequential node /a-
        2. create node /b
        }

        The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist

        However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible.

        Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously

        1. ZOK, ZOK
        2. ZOK, ZNODEEXISTS,
        3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY,

        When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor

        1. create /a-0001
        2. create /a-0003
        3. create /a-0001

        The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node ("/"). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op

        The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not.
        Thawan Kooburat made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Assignee Thawan Kooburat [ thawan ]
        Thawan Kooburat made changes -
        Field Original Value New Value
        Attachment ZOOKEEPER-1624.patch [ 12565429 ]
        Hide
        Thawan Kooburat added a comment -

        Since we don't have async multi-op in Java, the unit test have to be in c-client unit test

        Show
        Thawan Kooburat added a comment - Since we don't have async multi-op in Java, the unit test have to be in c-client unit test
        Thawan Kooburat created issue -

          People

          • Assignee:
            Thawan Kooburat
            Reporter:
            Thawan Kooburat
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development