Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: None
    • Fix Version/s: 3.0.0
    • Component/s: java client, server, tests
    • Labels:
      None
    • Environment:

      Linux corner-cube 2.6.24-19-generic #1 SMP Fri Jul 11 23:41:49 UTC 2008 i686 GNU/Linux
      java version "1.6.0_06"
      Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
      Java HotSpot(TM) Client VM (build 10.0-b22, mixed mode, sharing)

      Description

      StatCallback appears to be broken in trunk. I'll attach a patch for AsyncTest that triggers the behaviour.

      1. stat-callback-fail-test.diff
        2 kB
        Stu Hood
      2. stat-callback-test.diff
        2 kB
        Stu Hood

        Issue Links

          Activity

          Hide
          Stu Hood added a comment -

          Applies against r687111.

          Show
          Stu Hood added a comment - Applies against r687111.
          Hide
          Mahadev konar added a comment -

          stu,
          do you mean that the tests are broken? I am not sure what the jira means? Are you not getting statcallbacks?

          Show
          Mahadev konar added a comment - stu, do you mean that the tests are broken? I am not sure what the jira means? Are you not getting statcallbacks?
          Hide
          Stu Hood added a comment -

          The test shows that StatCallback does not get triggered (there are remaining outstanding calls at the end of the test, so it fails).

          We have noticed this in our code when using the latest trunk of ZooKeeper, so I wanted to try and reproduce it in the ZooKeeper testsuite.

          Show
          Stu Hood added a comment - The test shows that StatCallback does not get triggered (there are remaining outstanding calls at the end of the test, so it fails). We have noticed this in our code when using the latest trunk of ZooKeeper, so I wanted to try and reproduce it in the ZooKeeper testsuite.
          Hide
          Stu Hood added a comment -

          Sorry, it appears I left some code commented.

          If you uncomment that block, it will make an exists (StatCallback) call, and increment the outstanding requests. The HammerThread implements StatCallback to decrement the outstanding requests. Therefore, at the end of the test getOutstanding() == 0. Correct?

          It fails consistently on my system with 0 != 32.

          Show
          Stu Hood added a comment - Sorry, it appears I left some code commented. If you uncomment that block, it will make an exists (StatCallback) call, and increment the outstanding requests. The HammerThread implements StatCallback to decrement the outstanding requests. Therefore, at the end of the test getOutstanding() == 0. Correct? It fails consistently on my system with 0 != 32.
          Hide
          Mahadev konar added a comment -

          assigning it to 3.0 to investigate.

          Show
          Mahadev konar added a comment - assigning it to 3.0 to investigate.
          Hide
          Mahadev konar added a comment -

          stu,
          i just tried the patch on trunk and saw that the test passes on my machine.... what OS are you running on?

          Show
          Mahadev konar added a comment - stu, i just tried the patch on trunk and saw that the test passes on my machine.... what OS are you running on?
          Hide
          Stu Hood added a comment -

          Its listed above in the Environment section.

          Did you uncomment the two lines I mentioned? When I get back to that machine, I'll test it again with trunk.

          Thanks!

          Show
          Stu Hood added a comment - Its listed above in the Environment section. Did you uncomment the two lines I mentioned? When I get back to that machine, I'll test it again with trunk. Thanks!
          Hide
          Stu Hood added a comment -

          Just tested again on the listed machine with r693912, and I'm still seeing the exact same results.

          I'm reattaching the patch, with the two lines I mentioned uncommented, for convenience.

          Show
          Stu Hood added a comment - Just tested again on the listed machine with r693912, and I'm still seeing the exact same results. I'm reattaching the patch, with the two lines I mentioned uncommented, for convenience.
          Hide
          Mahadev konar added a comment -

          thanks stu. i apologize, i might have forgotten to un comment the lines you mentioned. ill take a look at this ... thanks again. ..

          Show
          Mahadev konar added a comment - thanks stu. i apologize, i might have forgotten to un comment the lines you mentioned. ill take a look at this ... thanks again. ..
          Hide
          Patrick Hunt added a comment -

          I'll look into this, I have a very similar environment.

          Show
          Patrick Hunt added a comment - I'll look into this, I have a very similar environment.
          Hide
          Patrick Hunt added a comment -

          This is failing on my machine as well.

          1) I think this is related to ZOOKEEPER-137 - there is a flaw in the watcher handling if the same watcher object is registered on the same node for data watches while async operations are performed on the node

          2) the test as modified by the current attached patch is flawed. when bang is set to false all of the hammer threads will immediately close their zk clients and exit the run method. pending callbacks are dropped and the counters will not be updated properly. to make the test valid we'd have to have the hammer threads wait (for some bounded amt of time) while they still had outstanding requests. The testHammer() driver method would also have to remove the added sleep(5000) because I don't think this is enough time for all of the hammers to exit cleanly (before interrupting them).

          Show
          Patrick Hunt added a comment - This is failing on my machine as well. 1) I think this is related to ZOOKEEPER-137 - there is a flaw in the watcher handling if the same watcher object is registered on the same node for data watches while async operations are performed on the node 2) the test as modified by the current attached patch is flawed. when bang is set to false all of the hammer threads will immediately close their zk clients and exit the run method. pending callbacks are dropped and the counters will not be updated properly. to make the test valid we'd have to have the hammer threads wait (for some bounded amt of time) while they still had outstanding requests. The testHammer() driver method would also have to remove the added sleep(5000) because I don't think this is enough time for all of the hammers to exit cleanly (before interrupting them).
          Hide
          Patrick Hunt added a comment -

          This looks like ZOOKEEPER-137 to me.

          Show
          Patrick Hunt added a comment - This looks like ZOOKEEPER-137 to me.
          Hide
          Patrick Hunt added a comment -

          Stu, btw. You do know that if you call something like:

          watcher = new watcher();
          zk.exists("/foo", watcher); // sync or async doesn't matter
          zk.exists("/foo", watcher);
          zk.exists("/foo", watcher);

          that only a single event will ever come back to the watcher, correct? I mean that we don't store duplicates (where the node name and watcher together define uniqueness).

          Of course

          zk.exists("/foo, new watcher()); // sync or async doesn't matter
          zk.exists("/foo", new watcher());
          zk.exists("/foo", new watcher());

          would result in three events triggered on the client (one for each unique watcher)

          Show
          Patrick Hunt added a comment - Stu, btw. You do know that if you call something like: watcher = new watcher(); zk.exists("/foo", watcher); // sync or async doesn't matter zk.exists("/foo", watcher); zk.exists("/foo", watcher); that only a single event will ever come back to the watcher, correct? I mean that we don't store duplicates (where the node name and watcher together define uniqueness). Of course zk.exists("/foo, new watcher()); // sync or async doesn't matter zk.exists("/foo", new watcher()); zk.exists("/foo", new watcher()); would result in three events triggered on the client (one for each unique watcher)
          Hide
          Stu Hood added a comment -

          I understand that a watch can only ever being triggered once and that only one watch can exist per client and znode. But I'm not sure how it is relevant to this test: each znode in the test is created with CreateMode.PERSISTENT_SEQUENTIAL, so multiple threads adding watches should never result in the condition you mentioned.

          Thanks!

          Show
          Stu Hood added a comment - I understand that a watch can only ever being triggered once and that only one watch can exist per client and znode. But I'm not sure how it is relevant to this test: each znode in the test is created with CreateMode.PERSISTENT_SEQUENTIAL, so multiple threads adding watches should never result in the condition you mentioned. Thanks!
          Hide
          Patrick Hunt added a comment -

          I think there are 2 orthogonal issues here:

          1) I believe you are having a problem in your (real) client code, I think it's due to 137, dropping of events
          2) however the test you created is broken, it will fail regardless of 137. the problem is as I described, by closing the zk client when bang=false you will lose events (those not yet delivered to the watchers).

          I'm working on 1, you might try changing the test as I described and see if you still see 2 (or just wait for the fix for 1...)

          Show
          Patrick Hunt added a comment - I think there are 2 orthogonal issues here: 1) I believe you are having a problem in your (real) client code, I think it's due to 137, dropping of events 2) however the test you created is broken, it will fail regardless of 137. the problem is as I described, by closing the zk client when bang=false you will lose events (those not yet delivered to the watchers). I'm working on 1, you might try changing the test as I described and see if you still see 2 (or just wait for the fix for 1...)
          Hide
          Patrick Hunt added a comment -

          Stu, can you tell me if this issue was addressed with the fix for ZOOKEEPER-137 ?

          Show
          Patrick Hunt added a comment - Stu, can you tell me if this issue was addressed with the fix for ZOOKEEPER-137 ?
          Hide
          Stu Hood added a comment -

          I just spent an hour trying to reproduce the issue, only to discover that it was not caused by StatCallback at all.

          In fact, it was caused by the removal of the default sort for getChildren (doh), so you were mostly right about #1. And, therefore you were right about #2.

          Very sorry to have wasted your time, but I'm glad 137 is fixed. Thanks!

          Show
          Stu Hood added a comment - I just spent an hour trying to reproduce the issue, only to discover that it was not caused by StatCallback at all. In fact, it was caused by the removal of the default sort for getChildren (doh), so you were mostly right about #1. And, therefore you were right about #2. Very sorry to have wasted your time, but I'm glad 137 is fixed. Thanks!
          Hide
          Patrick Hunt added a comment -

          haha, that's funny. I love those.

          No worries. Thanks for the feedback and taking the time to followup.

          Show
          Patrick Hunt added a comment - haha, that's funny. I love those. No worries. Thanks for the feedback and taking the time to followup.
          Hide
          Patrick Hunt added a comment -

          3.0.0 has been released, closing issues.

          Show
          Patrick Hunt added a comment - 3.0.0 has been released, closing issues.

            People

            • Assignee:
              Unassigned
              Reporter:
              Stu Hood
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development