ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-961

Watch recovery after disconnection when connection string contains a prefix

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 3.3.1
    • Fix Version/s: 3.3.4, 3.4.0
    • Component/s: java client
    • Labels:
      None
    • Environment:

      Windows 32 bits

    • Hadoop Flags:
      Reviewed
    • Tags:
      disconnected watch

      Description

      Let's say you're using connection string "127.0.0.1:2182/foo".
      1) put a childrenchanged watch on relative / (that is, on absolute path /foo)
      2) stop the zk server
      3) start the zk server
      4) at this point, the client recovers the connection, and should have put back a watch on relative path /, but instead the client puts a watch on the absolute path /

      • if some other client adds or removes a node under /foo, nothing will happen
      • if some other client adds or removes a node under /, then you will get an error from the zk client library (string operation error)
      1. ZOOKEEPER-961.patch
        16 kB
        Matthias Spycher
      2. ZOOKEEPER-961b.patch
        14 kB
        Mahadev konar
      3. ZOOKEEPER-961.patch
        50 kB
        Thomas Koch
      4. ZOOKEEPER-961b.patch
        14 kB
        Matthias Spycher
      5. ZOOKEEPER-961.patch
        5 kB
        Thomas Koch

        Issue Links

          Activity

          Hide
          Thomas Koch added a comment -

          I already proposed, that the chroot shouldn't be an attribute of the ClientCnxn class. Fixing this bug would be a good occasion to do this change.

          Show
          Thomas Koch added a comment - I already proposed, that the chroot shouldn't be an attribute of the ClientCnxn class. Fixing this bug would be a good occasion to do this change.
          Hide
          Thomas Koch added a comment -

          This preliminary file for now only adds a test and verifies, that the bug is indeed present as described.

          Show
          Thomas Koch added a comment - This preliminary file for now only adds a test and verifies, that the bug is indeed present as described.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12467353/ZOOKEEPER-961.patch
          against trunk revision 1053497.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/80//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/80//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/80//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12467353/ZOOKEEPER-961.patch against trunk revision 1053497. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/80//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/80//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/80//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12467353/ZOOKEEPER-961.patch
          against trunk revision 1055924.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/89//testReport/
          Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/89//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/89//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12467353/ZOOKEEPER-961.patch against trunk revision 1055924. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/89//testReport/ Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/89//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/89//console This message is automatically generated.
          Hide
          Mahadev konar added a comment -

          thomas, thanks for providing the patch. The test verifies that the problem exists. Do you have a patch for fix this?

          Show
          Mahadev konar added a comment - thomas, thanks for providing the patch. The test verifies that the problem exists. Do you have a patch for fix this?
          Hide
          Mahadev konar added a comment -

          actually this might be a good candidate for 3.3 as well. This would be critical issue for folks using chroot in zookeeper. I am marking it for 3.3.3 and 3.4 and will see if we can get it in 3.3.3.

          Show
          Mahadev konar added a comment - actually this might be a good candidate for 3.3 as well. This would be critical issue for folks using chroot in zookeeper. I am marking it for 3.3.3 and 3.4 and will see if we can get it in 3.3.3.
          Hide
          Mahadev konar added a comment -

          cancellling patch, since it does not include the fix as of yet.

          Show
          Mahadev konar added a comment - cancellling patch, since it does not include the fix as of yet.
          Hide
          Thomas Koch added a comment -

          I haven't done a patch yet since I wanted to do it on top of ZOOKEEPER-911. A patch for 3.3.x would probably be different then what I'd prefer for 3.4.
          In 3.4 I want to share one ClientCnxn over different client code with probably different chroots. Thus even if there are different things running in one JVM that needs different chroots, there would still be only one tcp connection.
          So I'd prefer to store the chroot alongside every watch. The memory consumption shouldn't that much, since it's the same chroot String instance everytime.
          If you agree with this approach, I could provide a patch for 3.3 and 3.4 (on top of ZK-911).

          Show
          Thomas Koch added a comment - I haven't done a patch yet since I wanted to do it on top of ZOOKEEPER-911 . A patch for 3.3.x would probably be different then what I'd prefer for 3.4. In 3.4 I want to share one ClientCnxn over different client code with probably different chroots. Thus even if there are different things running in one JVM that needs different chroots, there would still be only one tcp connection. So I'd prefer to store the chroot alongside every watch. The memory consumption shouldn't that much, since it's the same chroot String instance everytime. If you agree with this approach, I could provide a patch for 3.3 and 3.4 (on top of ZK-911).
          Hide
          Mahadev konar added a comment -

          not a blocker. Moving it out of 3.4 release.

          Show
          Mahadev konar added a comment - not a blocker. Moving it out of 3.4 release.
          Hide
          Benjamin Reed added a comment -

          ZOOKEEPER-911 isn't going to make it into 3.4.0, so we need to get this patch committed without it. are you going to be able to make the change thomas? or do you want me to submit something?

          Show
          Benjamin Reed added a comment - ZOOKEEPER-911 isn't going to make it into 3.4.0, so we need to get this patch committed without it. are you going to be able to make the change thomas? or do you want me to submit something?
          Hide
          Matthias Spycher added a comment -

          Attachment 961b contains a patch and new test that fixes this issue.
          Note that ClientCnxn.java also has a minor change that moves the check for 'closing' in run() after the sleep that was previously part of the startConnect() method.

          Show
          Matthias Spycher added a comment - Attachment 961b contains a patch and new test that fixes this issue. Note that ClientCnxn.java also has a minor change that moves the check for 'closing' in run() after the sleep that was previously part of the startConnect() method.
          Hide
          Mahadev konar added a comment -

          @Matthias,

          Thanks for the patch. Ill take a look by EOD today.

          Show
          Mahadev konar added a comment - @Matthias, Thanks for the patch. Ill take a look by EOD today.
          Hide
          Mahadev konar added a comment -

          @Matthias,
          Do you have a aptch for 3.3 branch as well? Or does the trunk patch apply?

          Show
          Mahadev konar added a comment - @Matthias, Do you have a aptch for 3.3 branch as well? Or does the trunk patch apply?
          Hide
          Matthias Spycher added a comment -

          Reattached 961b with license granted to ASF.

          Show
          Matthias Spycher added a comment - Reattached 961b with license granted to ASF.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12467353/ZOOKEEPER-961.patch
          against trunk revision 1166970.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/525//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12467353/ZOOKEEPER-961.patch against trunk revision 1166970. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/525//console This message is automatically generated.
          Hide
          Matthias Spycher added a comment -

          Patch should work for both 3.3. and trunk.

          Show
          Matthias Spycher added a comment - Patch should work for both 3.3. and trunk.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12494062/ZOOKEEPER-961b.patch
          against trunk revision 1166970.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/526//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/526//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/526//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12494062/ZOOKEEPER-961b.patch against trunk revision 1166970. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/526//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/526//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/526//console This message is automatically generated.
          Hide
          Thomas Koch added a comment -

          Hi Matthias,

          please be so kind to assign issues to yourself if you work on them. I also spent several hours now for a patch.

          Show
          Thomas Koch added a comment - Hi Matthias, please be so kind to assign issues to yourself if you work on them. I also spent several hours now for a patch.
          Hide
          Thomas Koch added a comment -

          This patch is one I had still around and updated it against trunk.
          It fixes the watch recovery issue and by doing so decouples the circular dependency between ZooKeeper and ClientCnxn

          Show
          Thomas Koch added a comment - This patch is one I had still around and updated it against trunk. It fixes the watch recovery issue and by doing so decouples the circular dependency between ZooKeeper and ClientCnxn
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1809/
          -----------------------------------------------------------

          Review request for zookeeper.

          Summary
          -------

          • chroot is not a property of the ClientCnxn anymore but of the ZooKeeper class
          • ClientCnxn has no reference to ZooKeeper anymore
          • The WatchManager stores a reference to the chroot String with every watcher. It's not that much overhead since it's the same String instance all the time. The idea is, to prepare for additional patches that make it possible to use the same ClientCnxn with different ZooKeeper client instances that have different chroots.

          This addresses bug ZOOKEEPER-961.
          https://issues.apache.org/jira/browse/ZOOKEEPER-961

          Diffs


          src/java/main/org/apache/zookeeper/ClientCnxn.java 78b2eb2
          src/java/main/org/apache/zookeeper/ClientWatchManager.java d56374d
          src/java/main/org/apache/zookeeper/ZooKeeper.java e82eaa0
          src/java/main/org/apache/zookeeper/client/WatchManager.java PRE-CREATION
          src/java/main/org/apache/zookeeper/client/WatchRegistration.java PRE-CREATION
          src/java/test/org/apache/zookeeper/TestableZooKeeper.java f8344b6
          src/java/test/org/apache/zookeeper/test/WatcherTest.java 90ec513

          Diff: https://reviews.apache.org/r/1809/diff

          Testing
          -------

          Thanks,

          Thomas

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1809/ ----------------------------------------------------------- Review request for zookeeper. Summary ------- chroot is not a property of the ClientCnxn anymore but of the ZooKeeper class ClientCnxn has no reference to ZooKeeper anymore The WatchManager stores a reference to the chroot String with every watcher. It's not that much overhead since it's the same String instance all the time. The idea is, to prepare for additional patches that make it possible to use the same ClientCnxn with different ZooKeeper client instances that have different chroots. This addresses bug ZOOKEEPER-961 . https://issues.apache.org/jira/browse/ZOOKEEPER-961 Diffs src/java/main/org/apache/zookeeper/ClientCnxn.java 78b2eb2 src/java/main/org/apache/zookeeper/ClientWatchManager.java d56374d src/java/main/org/apache/zookeeper/ZooKeeper.java e82eaa0 src/java/main/org/apache/zookeeper/client/WatchManager.java PRE-CREATION src/java/main/org/apache/zookeeper/client/WatchRegistration.java PRE-CREATION src/java/test/org/apache/zookeeper/TestableZooKeeper.java f8344b6 src/java/test/org/apache/zookeeper/test/WatcherTest.java 90ec513 Diff: https://reviews.apache.org/r/1809/diff Testing ------- Thanks, Thomas
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12494089/ZOOKEEPER-961.patch
          against trunk revision 1166970.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/528//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/528//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/528//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12494089/ZOOKEEPER-961.patch against trunk revision 1166970. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/528//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/528//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/528//console This message is automatically generated.
          Hide
          Thomas Koch added a comment -

          my last patch failed grandios and I won't be able to work on this again until wednesday, so unassigning me

          Show
          Thomas Koch added a comment - my last patch failed grandios and I won't be able to work on this again until wednesday, so unassigning me
          Hide
          Matthias Spycher added a comment -

          Sorry Thomas, didn't mean to step on anyone's toes.
          I happened to be working on this locally last week, and later discovered this Jira entry.
          Also, I still think your refactoring/cleanup is worthwhile.

          Show
          Matthias Spycher added a comment - Sorry Thomas, didn't mean to step on anyone's toes. I happened to be working on this locally last week, and later discovered this Jira entry. Also, I still think your refactoring/cleanup is worthwhile.
          Hide
          Mahadev konar added a comment -

          Reuploading Matt's patch for hudson CI.

          Show
          Mahadev konar added a comment - Reuploading Matt's patch for hudson CI.
          Hide
          Mahadev konar added a comment -

          Matthias,
          Looks like the patch doesnt apply to 3.3 branch. Can you please create a patch for that? thanks

          Show
          Mahadev konar added a comment - Matthias, Looks like the patch doesnt apply to 3.3 branch. Can you please create a patch for that? thanks
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12494339/ZOOKEEPER-961b.patch
          against trunk revision 1170365.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/530//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/530//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/530//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12494339/ZOOKEEPER-961b.patch against trunk revision 1170365. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/530//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/530//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/530//console This message is automatically generated.
          Hide
          Matthias Spycher added a comment -

          This attachment is intended for branch 3.3.
          It prepends the chroot (if any) for all watches in SendThread.primeConnection(...).
          Also note that the check for closing is done in SendThread.startConnect() after the sleep. We still have a potential race between close/connecting, but it's an improvement.

          Show
          Matthias Spycher added a comment - This attachment is intended for branch 3.3. It prepends the chroot (if any) for all watches in SendThread.primeConnection(...). Also note that the check for closing is done in SendThread.startConnect() after the sleep. We still have a potential race between close/connecting, but it's an improvement.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12494355/ZOOKEEPER-961.patch
          against trunk revision 1170365.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/533//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12494355/ZOOKEEPER-961.patch against trunk revision 1170365. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/533//console This message is automatically generated.
          Hide
          Mahadev konar added a comment -

          +1 looks good to me.

          Show
          Mahadev konar added a comment - +1 looks good to me.
          Hide
          Mahadev konar added a comment -

          I just pushed this to 3.3 and trunk. Thanks Matthias!

          Show
          Mahadev konar added a comment - I just pushed this to 3.3 and trunk. Thanks Matthias!
          Hide
          Mahadev konar added a comment -

          @Thomas Thomas Koch ,
          I think your patch attachment: https://issues.apache.org/jira/secure/attachment/12494089/ZOOKEEPER-961.patch is definitely worth going into 3.5.0. I like the way you have cleaned up the interaction between ClientCnxn and ZooKeeper. Would you mind filing a seperate jira for that? I'd be happy to review and commit it into trunk.

          Show
          Mahadev konar added a comment - @Thomas Thomas Koch , I think your patch attachment: https://issues.apache.org/jira/secure/attachment/12494089/ZOOKEEPER-961.patch is definitely worth going into 3.5.0. I like the way you have cleaned up the interaction between ClientCnxn and ZooKeeper. Would you mind filing a seperate jira for that? I'd be happy to review and commit it into trunk.
          Hide
          Camille Fournier added a comment -

          Why is there a change to the isConnected logic that came in with this patch? Specifically, why on line 967 do we now have:
          if (closing || !state.isAlive())

          { break; }

          ? This seems unrelated to the fix.

          Show
          Camille Fournier added a comment - Why is there a change to the isConnected logic that came in with this patch? Specifically, why on line 967 do we now have: if (closing || !state.isAlive()) { break; } ? This seems unrelated to the fix.
          Hide
          Matthias Spycher added a comment -

          Agreed, this line isn't related to the fix. It slipped in - I should probably have separated the changes into two separate patches.
          It has to do with honoring the close() as early as possible. Note that if state.isAlive() returns false, the while loop ends, so we shouldn't issue warnings about attempting to reconnect. I don't see how it changes the isConnected logic. My assumption here is that we shouldn't allow a transition from CLOSE or AUTH_FAILED back to CONNECTING.

          Show
          Matthias Spycher added a comment - Agreed, this line isn't related to the fix. It slipped in - I should probably have separated the changes into two separate patches. It has to do with honoring the close() as early as possible. Note that if state.isAlive() returns false, the while loop ends, so we shouldn't issue warnings about attempting to reconnect. I don't see how it changes the isConnected logic. My assumption here is that we shouldn't allow a transition from CLOSE or AUTH_FAILED back to CONNECTING.
          Hide
          Camille Fournier added a comment -

          It definitely shouldn't be part of this fix, and deserves a separate checkin (and testing). Mahadev, how do you want to handle this?

          Show
          Camille Fournier added a comment - It definitely shouldn't be part of this fix, and deserves a separate checkin (and testing). Mahadev, how do you want to handle this?
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/1809/
          -----------------------------------------------------------

          (Updated 2011-09-14 22:16:59.743604)

          Review request for zookeeper.

          Summary (updated)
          -------

          • chroot is not a property of the ClientCnxn anymore but of the ZooKeeper class
          • ClientCnxn has no reference to ZooKeeper anymore
          • The WatchManager stores a reference to the chroot String with every watcher. It's not that much overhead since it's the same String instance all the time. The idea is, to prepare for additional patches that make it possible to use the same ClientCnxn with different ZooKeeper client instances that have different chroots.

          This addresses bug ZOOKEEPER-961.
          https://issues.apache.org/jira/browse/ZOOKEEPER-961

          Diffs


          src/java/main/org/apache/zookeeper/ClientCnxn.java 78b2eb2
          src/java/main/org/apache/zookeeper/ClientWatchManager.java d56374d
          src/java/main/org/apache/zookeeper/ZooKeeper.java e82eaa0
          src/java/main/org/apache/zookeeper/client/WatchManager.java PRE-CREATION
          src/java/main/org/apache/zookeeper/client/WatchRegistration.java PRE-CREATION
          src/java/test/org/apache/zookeeper/TestableZooKeeper.java f8344b6
          src/java/test/org/apache/zookeeper/test/WatcherTest.java 90ec513

          Diff: https://reviews.apache.org/r/1809/diff

          Testing
          -------

          Thanks,

          Thomas

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1809/ ----------------------------------------------------------- (Updated 2011-09-14 22:16:59.743604) Review request for zookeeper. Summary (updated) ------- chroot is not a property of the ClientCnxn anymore but of the ZooKeeper class ClientCnxn has no reference to ZooKeeper anymore The WatchManager stores a reference to the chroot String with every watcher. It's not that much overhead since it's the same String instance all the time. The idea is, to prepare for additional patches that make it possible to use the same ClientCnxn with different ZooKeeper client instances that have different chroots. This addresses bug ZOOKEEPER-961 . https://issues.apache.org/jira/browse/ZOOKEEPER-961 Diffs src/java/main/org/apache/zookeeper/ClientCnxn.java 78b2eb2 src/java/main/org/apache/zookeeper/ClientWatchManager.java d56374d src/java/main/org/apache/zookeeper/ZooKeeper.java e82eaa0 src/java/main/org/apache/zookeeper/client/WatchManager.java PRE-CREATION src/java/main/org/apache/zookeeper/client/WatchRegistration.java PRE-CREATION src/java/test/org/apache/zookeeper/TestableZooKeeper.java f8344b6 src/java/test/org/apache/zookeeper/test/WatcherTest.java 90ec513 Diff: https://reviews.apache.org/r/1809/diff Testing ------- Thanks, Thomas
          Hide
          Hudson added a comment -

          Integrated in ZooKeeper-trunk #1304 (See https://builds.apache.org/job/ZooKeeper-trunk/1304/)
          ZOOKEEPER-961. Watch recovery after disconnection when connection string contains a prefix. (Matthias Spycher via mahadev)

          mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1170438
          Files :

          • /zookeeper/trunk/CHANGES.txt
          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java
          • /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/DisconnectedWatcherTest.java
          Show
          Hudson added a comment - Integrated in ZooKeeper-trunk #1304 (See https://builds.apache.org/job/ZooKeeper-trunk/1304/ ) ZOOKEEPER-961 . Watch recovery after disconnection when connection string contains a prefix. (Matthias Spycher via mahadev) mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1170438 Files : /zookeeper/trunk/CHANGES.txt /zookeeper/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/test/DisconnectedWatcherTest.java
          Hide
          Mahadev konar added a comment -

          @Camille,
          Sorry I missed the comment. The change regarding the check seems ok to me. I do agree it should have been a seperate jira, I am ok if you want to pull that change out of 3.4. Let me know.

          Show
          Mahadev konar added a comment - @Camille, Sorry I missed the comment. The change regarding the check seems ok to me. I do agree it should have been a seperate jira, I am ok if you want to pull that change out of 3.4. Let me know.
          Hide
          Camille Fournier added a comment -

          I'll pull it out, I don't think it makes any sense and it shouldn't be in without a test. This includes these pieces:

          private void startConnect() throws IOException {

          • if(!isFirstConnect){
          • try { - Thread.sleep(r.nextInt(1000)); - }

            catch (InterruptedException e)

            { - LOG.warn("Unexpected exception", e); - }
          • }
            state = States.CONNECTING;

          InetSocketAddress addr;
          @@ -937,8 +956,15 @@
          while (state.isAlive()) {
          try {
          if (!clientCnxnSocket.isConnected()) {
          + if(!isFirstConnect){
          + try

          { + Thread.sleep(r.nextInt(1000)); + }

          catch (InterruptedException e)

          { + LOG.warn("Unexpected exception", e); + }

          + }
          // don't re-establish connection if we are closing

          • if (closing) {
            + if (closing || !state.isAlive()) { break; }

            startConnect();

          Show
          Camille Fournier added a comment - I'll pull it out, I don't think it makes any sense and it shouldn't be in without a test. This includes these pieces: private void startConnect() throws IOException { if(!isFirstConnect){ try { - Thread.sleep(r.nextInt(1000)); - } catch (InterruptedException e) { - LOG.warn("Unexpected exception", e); - } } state = States.CONNECTING; InetSocketAddress addr; @@ -937,8 +956,15 @@ while (state.isAlive()) { try { if (!clientCnxnSocket.isConnected()) { + if(!isFirstConnect){ + try { + Thread.sleep(r.nextInt(1000)); + } catch (InterruptedException e) { + LOG.warn("Unexpected exception", e); + } + } // don't re-establish connection if we are closing if (closing) { + if (closing || !state.isAlive()) { break; } startConnect();
          Hide
          Mahadev konar added a comment -

          @Camille,
          I actually like the change in the first part. Makes it cleaner and is also a no op change. What do you think?

          Show
          Mahadev konar added a comment - @Camille, I actually like the change in the first part. Makes it cleaner and is also a no op change. What do you think?
          Hide
          Camille Fournier added a comment -

          It's not a no-op change. In the original code, we check connected, check the closing flag, then wait in the startConnect loop. In the changed version, we check connected, wait, then check closing, then start connect. There's a meaningful difference in checking closing before we do the wait and attempted connection logic, and waiting, then checking close, then trying to connect.

          So, if we want to wait and then check the close before trying to connect, vs the original way, we can do that, but let's make it a conscious decision rather than something that actually got pulled in.

          The second part of the change is actually more of a no-op because I'm pretty sure you can't change the state anywhere but the send thread, so checking it after a sleep on that thread is pointless.

          Show
          Camille Fournier added a comment - It's not a no-op change. In the original code, we check connected, check the closing flag, then wait in the startConnect loop. In the changed version, we check connected, wait, then check closing, then start connect. There's a meaningful difference in checking closing before we do the wait and attempted connection logic, and waiting, then checking close, then trying to connect. So, if we want to wait and then check the close before trying to connect, vs the original way, we can do that, but let's make it a conscious decision rather than something that actually got pulled in. The second part of the change is actually more of a no-op because I'm pretty sure you can't change the state anywhere but the send thread, so checking it after a sleep on that thread is pointless.
          Hide
          Matthias Spycher added a comment -

          I'm working on a more elaborate fix to prevent races from overwriting CLOSED (at least for 3.4).
          I agree that the first part in the change above is more important than the one line in the exception catch clause, but feel free to move it to another issue.

          Show
          Matthias Spycher added a comment - I'm working on a more elaborate fix to prevent races from overwriting CLOSED (at least for 3.4). I agree that the first part in the change above is more important than the one line in the exception catch clause, but feel free to move it to another issue.
          Hide
          Camille Fournier added a comment -

          Oh Matthias, can you comment on the condition you're looking at? Is it something along the lines of
          https://issues.apache.org/jira/browse/ZOOKEEPER-1159?

          I never got anywhere with that and if you think you have figured out what the problem is it would be great to know.

          Show
          Camille Fournier added a comment - Oh Matthias, can you comment on the condition you're looking at? Is it something along the lines of https://issues.apache.org/jira/browse/ZOOKEEPER-1159? I never got anywhere with that and if you think you have figured out what the problem is it would be great to know.
          Hide
          Matthias Spycher added a comment -

          @Camille, you do see the problem of CLOSED being stomped by startConnect() right?

          Show
          Matthias Spycher added a comment - @Camille, you do see the problem of CLOSED being stomped by startConnect() right?
          Hide
          Matthias Spycher added a comment -

          State transitions in the SendThread are generally not safe. We don't prevent transitions like CLOSED > CONNECTING or CLOSED > CONNECTED – that's what I'm addressing in my next patch.

          Show
          Matthias Spycher added a comment - State transitions in the SendThread are generally not safe. We don't prevent transitions like CLOSED > CONNECTING or CLOSED > CONNECTED – that's what I'm addressing in my next patch.
          Hide
          Camille Fournier added a comment -

          Mmm I see, at least for calls to close/disconnect.

          Let's wrap all of this up in the fix for that issue. I suspect it will fix 1159, if you want to comment in there.

          Show
          Camille Fournier added a comment - Mmm I see, at least for calls to close/disconnect. Let's wrap all of this up in the fix for that issue. I suspect it will fix 1159, if you want to comment in there.
          Hide
          Camille Fournier added a comment -

          I'm rolling the unrelated changes that were checked in on this ticket from both 3.3.3 and 3.4. Please submit another ticket for those changes, with tests.

          Show
          Camille Fournier added a comment - I'm rolling the unrelated changes that were checked in on this ticket from both 3.3.3 and 3.4. Please submit another ticket for those changes, with tests.
          Hide
          Hudson added a comment -

          Integrated in ZooKeeper-trunk #1309 (See https://builds.apache.org/job/ZooKeeper-trunk/1309/)
          rolling back excess checkin from ZOOKEEPER-961

          camille : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1172406
          Files :

          • /zookeeper/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java
          Show
          Hudson added a comment - Integrated in ZooKeeper-trunk #1309 (See https://builds.apache.org/job/ZooKeeper-trunk/1309/ ) rolling back excess checkin from ZOOKEEPER-961 camille : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1172406 Files : /zookeeper/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java

            People

            • Assignee:
              Matthias Spycher
              Reporter:
              pmpm47
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development