Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.5.0
    • Component/s: java client, server
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      currently the only way a watch cleared is to trigger it. we need a way to enumerate the outstanding watch objects, find watch events the objects are watching for, and remove interests in an event.

      1. Remove Watch API.pdf
        303 kB
        Rakesh R
      2. ZOOKEEPER-442.patch
        68 kB
        Rakesh R
      3. ZOOKEEPER-442.patch
        69 kB
        Rakesh R
      4. ZOOKEEPER-442.patch
        68 kB
        Rakesh R
      5. ZOOKEEPER-442.patch
        62 kB
        Rakesh R
      6. ZOOKEEPER-442.patch
        62 kB
        Rakesh R
      7. ZOOKEEPER-442.patch
        58 kB
        Rakesh R
      8. ZOOKEEPER-442.patch
        58 kB
        Rakesh R
      9. ZOOKEEPER-442.patch
        58 kB
        Rakesh R
      10. ZOOKEEPER-442.patch
        43 kB
        Daniel Gómez Ferro
      11. ZOOKEEPER-442.patch
        38 kB
        Daniel Gómez Ferro
      12. ZOOKEEPER-442.patch
        39 kB
        Daniel Gómez Ferro
      13. ZOOKEEPER-442.patch
        40 kB
        Daniel Gómez Ferro
      14. ZOOKEEPER-442.patch
        38 kB
        Daniel Gómez Ferro
      15. ZOOKEEPER-442.patch
        40 kB
        Daniel Gómez Ferro
      16. ZOOKEEPER-442.patch
        27 kB
        Daniel Gómez Ferro

        Activity

        Hide
        Benjamin Reed added a comment -

        there are two problematic scenarios: 1) an application that has many transient interests can register a bunch of watches which wastes memory to monitor the watches (granted it is a very small amount of memory) and it cases unnecessary processing when those watches are triggered 2) applications need to be prepared to ignore watch events that they are no longer interested in.

        Show
        Benjamin Reed added a comment - there are two problematic scenarios: 1) an application that has many transient interests can register a bunch of watches which wastes memory to monitor the watches (granted it is a very small amount of memory) and it cases unnecessary processing when those watches are triggered 2) applications need to be prepared to ignore watch events that they are no longer interested in.
        Hide
        Patrick Hunt added a comment -

        we must be careful to clearly define/implement the semantics of removing a watch. for example watches may have been
        triggered and a notification delivered to the client but not yet processed by the client - it's waiting in the event queue. etc...
        what are we removing? the trigger or both the trigger and notification(s)? what if one client has been notified but not
        others (say two clients watch a node N, c1 is on a fast server and gets notified, c2 removes the watch on a slow server
        before the trigger fires, ...) etc...

        it would be great if someone was willing to document a proposal(s) and post it on this jira for review/comment. once
        we have something clearly defined/agreedupon it shouldn't be too hard to implement. (hah! )

        Show
        Patrick Hunt added a comment - we must be careful to clearly define/implement the semantics of removing a watch. for example watches may have been triggered and a notification delivered to the client but not yet processed by the client - it's waiting in the event queue. etc... what are we removing? the trigger or both the trigger and notification(s)? what if one client has been notified but not others (say two clients watch a node N, c1 is on a fast server and gets notified, c2 removes the watch on a slow server before the trigger fires, ...) etc... it would be great if someone was willing to document a proposal(s) and post it on this jira for review/comment. once we have something clearly defined/agreedupon it shouldn't be too hard to implement. (hah! )
        Hide
        Patrick Hunt added a comment -

        this is a comment I made on the mailing list:

        We probably want to allow the caller to specify which type of watch they want to remove - a watch on the znode itself, on children of the znode, or both.

        zk.removeWatch(path, watcher, wtype)
        where:
        path is path to the znode
        watcher may be a specific watcher or null matching all watchers
        wtype is enum of TYPE_CHILD, TYPE_DATA, or TYPE_ALL (something like that)

        We also need to be careful of the semantics since "watch" wraps up a couple of concepts: the trigger and the notification. What are we removing? The trigger or both? If just the trigger then you may still be notified after removeWatch is called (if notification is "in flight"). Perhaps removal of the watch should itself trigger the notification to all watchers, with a new notification type of "watch removed"?

        Show
        Patrick Hunt added a comment - this is a comment I made on the mailing list: We probably want to allow the caller to specify which type of watch they want to remove - a watch on the znode itself, on children of the znode, or both. zk.removeWatch(path, watcher, wtype) where: path is path to the znode watcher may be a specific watcher or null matching all watchers wtype is enum of TYPE_CHILD, TYPE_DATA, or TYPE_ALL (something like that) We also need to be careful of the semantics since "watch" wraps up a couple of concepts: the trigger and the notification. What are we removing? The trigger or both? If just the trigger then you may still be notified after removeWatch is called (if notification is "in flight"). Perhaps removal of the watch should itself trigger the notification to all watchers, with a new notification type of "watch removed"?
        Hide
        Henry Robinson added a comment -

        I think if anything applications are going to use this as a local barrier, so that they can assume that no watches will be called after the disable_watch command returns, rather than a remote one that guarantees that no more notifications will be sent.

        It's reasonably hard to guarantee this property, however, as we cannot predict what the scheduler is going to do, and therefore outstanding watches on the client may not be scheduled until an arbitrary time after they are supposedly disabled (because the trigger point comes before the disable command on the local machine). So maybe it's better not to offer a feature that might be misunderstood?

        Show
        Henry Robinson added a comment - I think if anything applications are going to use this as a local barrier, so that they can assume that no watches will be called after the disable_watch command returns, rather than a remote one that guarantees that no more notifications will be sent. It's reasonably hard to guarantee this property, however, as we cannot predict what the scheduler is going to do, and therefore outstanding watches on the client may not be scheduled until an arbitrary time after they are supposedly disabled (because the trigger point comes before the disable command on the local machine). So maybe it's better not to offer a feature that might be misunderstood?
        Hide
        Gunnar Wagenknecht added a comment -

        I'm affected by the transient scenario. It would be nice if there would be a method to remove/unregister a watch by object instance (no matter what triggers, etc.).

        Show
        Gunnar Wagenknecht added a comment - I'm affected by the transient scenario. It would be nice if there would be a method to remove/unregister a watch by object instance (no matter what triggers, etc.).
        Hide
        Vitalii Tymchyshyn added a comment -

        My little comment on why I need this feature to ensure my usecase will work: My application makes a job node and adds a watch on result node (unexisting, by name). After some time it may decide that a job is no longer needed (e.g. timeouted) and it removes job node and must unwatch result node. If it won't do so, there will be watch leak - this watches will never be removed as job name won't be reused and result will never be generated. Currently I issue "create node/delete node" commands to fire the watch, but it takes processing power and is ugly.

        Show
        Vitalii Tymchyshyn added a comment - My little comment on why I need this feature to ensure my usecase will work: My application makes a job node and adds a watch on result node (unexisting, by name). After some time it may decide that a job is no longer needed (e.g. timeouted) and it removes job node and must unwatch result node. If it won't do so, there will be watch leak - this watches will never be removed as job name won't be reused and result will never be generated. Currently I issue "create node/delete node" commands to fire the watch, but it takes processing power and is ugly.
        Hide
        Benjamin Reed added a comment -

        yes, vitalii, the exists watch for a node that will never exist, and the process that establishes the watch knows this, is probably the most compelling use case.

        just a slight clarification: the watch will be removed if the session that set it is closed. in many environments there are very long sessions; it's not forever, but it approximates forever

        Show
        Benjamin Reed added a comment - yes, vitalii, the exists watch for a node that will never exist, and the process that establishes the watch knows this, is probably the most compelling use case. just a slight clarification: the watch will be removed if the session that set it is closed. in many environments there are very long sessions; it's not forever, but it approximates forever
        Hide
        Gunnar Wagenknecht added a comment -

        This is getting critical in our environment. From a heap dump we recognized that the HashMap with all the values was very large and consumed ~70% of the heap (~3 million entries, collected over just a few days, ~8 million capacity).

        Our usage patter might be wrong, though. I'm setting an exists watch in order to wait for a node to be deleted within a certain timeout. The watch contains a CountDownLatch which allows me to sleep until the watch triggers or the timeout triggers. in the latter case I really need to remove the watch.

        A workaround would be to not set a watch at all and just sleep for a while and check in a loop. However, that would increase the read traffic to ZooKeeper dramatically.

        Any ideas?

        Show
        Gunnar Wagenknecht added a comment - This is getting critical in our environment. From a heap dump we recognized that the HashMap with all the values was very large and consumed ~70% of the heap (~3 million entries, collected over just a few days, ~8 million capacity). Our usage patter might be wrong, though. I'm setting an exists watch in order to wait for a node to be deleted within a certain timeout. The watch contains a CountDownLatch which allows me to sleep until the watch triggers or the timeout triggers. in the latter case I really need to remove the watch. A workaround would be to not set a watch at all and just sleep for a while and check in a loop. However, that would increase the read traffic to ZooKeeper dramatically. Any ideas?
        Hide
        Daniel Gómez Ferro added a comment -

        Are we still interested on this feature after ZOOKEEPER-1177 gets integrated?

        I have implemented a prototype based on Patrick's suggestion. Issues:

        • The removal doesn't guarantee that the watcher won't trigger.
        • If the user can specify a concrete watcher to be removed, sometimes we only have to remove the watcher on the client's side without contacting the server. If we want the async version of the call we need a mechanism to trigger the callback directly.
        Show
        Daniel Gómez Ferro added a comment - Are we still interested on this feature after ZOOKEEPER-1177 gets integrated? I have implemented a prototype based on Patrick's suggestion. Issues: The removal doesn't guarantee that the watcher won't trigger. If the user can specify a concrete watcher to be removed, sometimes we only have to remove the watcher on the client's side without contacting the server. If we want the async version of the call we need a mechanism to trigger the callback directly.
        Hide
        Daniel Gómez Ferro added a comment -

        Prototype of synchronous API to remove watches.

        Show
        Daniel Gómez Ferro added a comment - Prototype of synchronous API to remove watches.
        Hide
        Vitalii Tymchyshyn added a comment -

        I am still interesed in this ticket for two reasons:

        • This is a leak. Any leak must be fixed
        • While ZOOKEEPER-1177 fixes server-side, it does not help client-side, is it? And in client side every watch is a listener object, so this will use memory quite fast.
        Show
        Vitalii Tymchyshyn added a comment - I am still interesed in this ticket for two reasons: This is a leak. Any leak must be fixed While ZOOKEEPER-1177 fixes server-side, it does not help client-side, is it? And in client side every watch is a listener object, so this will use memory quite fast.
        Hide
        Gunnar Wagenknecht added a comment -

        Same here. We need a way to remove watches on the client side when they are no longer of interest. Otherwise they won't be garbage collected if they never trigger.

        Show
        Gunnar Wagenknecht added a comment - Same here. We need a way to remove watches on the client side when they are no longer of interest. Otherwise they won't be garbage collected if they never trigger.
        Hide
        Daniel Gómez Ferro added a comment -

        How should we deal with the asynchronous version of the call? I see three options:

        • Don't provide an async version.
        • Remove the Watcher parameter, effectively removing all watches on the given znode.
        • Find a way to trigger the callback directly from the client (could also help ZOOKEEPER-847, where we want to throw an exception and invoke the callback)
        Show
        Daniel Gómez Ferro added a comment - How should we deal with the asynchronous version of the call? I see three options: Don't provide an async version. Remove the Watcher parameter, effectively removing all watches on the given znode. Find a way to trigger the callback directly from the client (could also help ZOOKEEPER-847 , where we want to throw an exception and invoke the callback)
        Hide
        Gunnar Wagenknecht added a comment -

        From my point of view I need the following:

        • Remove all local references to watcher object together with all coressponsing watcher objects in server (if connection is available).

        The second part should/can be done asynchronous. Especially because I'm not interested in the result (other clients actually may be interested).

        Show
        Gunnar Wagenknecht added a comment - From my point of view I need the following: Remove all local references to watcher object together with all coressponsing watcher objects in server (if connection is available). The second part should/can be done asynchronous. Especially because I'm not interested in the result (other clients actually may be interested).
        Hide
        Vitalii Tymchyshyn added a comment -

        OK, as for me:

        • On client each watcher group (grouping watchers on path) should have watcher state - e.g. "normal/deleting/creating"
        • Delete watcher message to server must be acked
        • If last watcher from group is removed, delete message is sent to server and watcher goes to deleting state
        • As soon as deletion is acked (and group is still in deletion), watcher group is removed on client
        • If new watcher to group is added and watcher group is new or in deleting state, "watch" message to server is sent and group goes to creating state
        • As soon as creation is acked ( and group is still in creating), watcher group goes to "normal".
        Show
        Vitalii Tymchyshyn added a comment - OK, as for me: On client each watcher group (grouping watchers on path) should have watcher state - e.g. "normal/deleting/creating" Delete watcher message to server must be acked If last watcher from group is removed, delete message is sent to server and watcher goes to deleting state As soon as deletion is acked (and group is still in deletion), watcher group is removed on client If new watcher to group is added and watcher group is new or in deleting state, "watch" message to server is sent and group goes to creating state As soon as creation is acked ( and group is still in creating), watcher group goes to "normal".
        Hide
        Daniel Gómez Ferro added a comment -

        I've attached a new patch with the asynchronous call and a mechanism to queue local callbacks on the event thread, which could also be used in ZOOKEEPER-847.

        Vitalii, I don't understand why would we need to keep track of the watcher group's state (normal/deleting/creating), could you explain?

        Show
        Daniel Gómez Ferro added a comment - I've attached a new patch with the asynchronous call and a mechanism to queue local callbacks on the event thread, which could also be used in ZOOKEEPER-847 . Vitalii, I don't understand why would we need to keep track of the watcher group's state (normal/deleting/creating), could you explain?
        Hide
        Vitalii Tymchyshyn added a comment -

        What I worry about is :

        • A group has watch X
        • A watch X is deleted, since it's last one - delete watch request is sent to server. Client watch list is not changed (true?) until ack is received
        • A watch Y is added. You must sent add watch request, but how do you know it? X is still in the list, so it's not empty.

        P.S. I must check the code to know time when client watch list is changed. This can be either request time or ack time. Both ways has it's tricky scenarios.

        Show
        Vitalii Tymchyshyn added a comment - What I worry about is : A group has watch X A watch X is deleted, since it's last one - delete watch request is sent to server. Client watch list is not changed (true?) until ack is received A watch Y is added. You must sent add watch request, but how do you know it? X is still in the list, so it's not empty. P.S. I must check the code to know time when client watch list is changed. This can be either request time or ack time. Both ways has it's tricky scenarios.
        Hide
        Daniel Gómez Ferro added a comment -

        When you set a watch you always send the request to the server, as the watch is set through a getChildren()/exists()/... call. So I think this is not a problem.

        Show
        Daniel Gómez Ferro added a comment - When you set a watch you always send the request to the server, as the watch is set through a getChildren()/exists()/... call. So I think this is not a problem.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12500857/ZOOKEEPER-442.patch
        against trunk revision 1189318.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/695//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12500857/ZOOKEEPER-442.patch against trunk revision 1189318. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/695//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12501063/ZOOKEEPER-442.patch
        against trunk revision 1189318.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 javac. The patch appears to cause tar ant target to fail.

        -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/696//testReport/
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/696//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501063/ZOOKEEPER-442.patch against trunk revision 1189318. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/696//testReport/ Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/696//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12501066/ZOOKEEPER-442.patch
        against trunk revision 1189318.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/697//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/697//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/697//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501066/ZOOKEEPER-442.patch against trunk revision 1189318. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/697//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/697//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/697//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12501075/ZOOKEEPER-442.patch
        against trunk revision 1189318.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 javac. The patch appears to cause tar ant target to fail.

        -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/699//testReport/
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/699//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501075/ZOOKEEPER-442.patch against trunk revision 1189318. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/699//testReport/ Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/699//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12501078/ZOOKEEPER-442.patch
        against trunk revision 1189318.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/701//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/701//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/701//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501078/ZOOKEEPER-442.patch against trunk revision 1189318. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/701//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/701//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/701//console This message is automatically generated.
        Hide
        Daniel Gómez Ferro added a comment -

        I think it is possible to guarantee that after the call finishes (or the callback gets notified) the watcher won't be triggered. I've attached a new patch doing this, but I'm not quite sure about the design, so any comment is welcomed.

        There is still the problem of what happens if one server receives the watch event before removing the watch, and another server afterwards. What do you think?

        Show
        Daniel Gómez Ferro added a comment - I think it is possible to guarantee that after the call finishes (or the callback gets notified) the watcher won't be triggered. I've attached a new patch doing this, but I'm not quite sure about the design, so any comment is welcomed. There is still the problem of what happens if one server receives the watch event before removing the watch, and another server afterwards. What do you think?
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12501290/ZOOKEEPER-442.patch
        against trunk revision 1190073.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/718//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/718//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/718//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501290/ZOOKEEPER-442.patch against trunk revision 1190073. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/718//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/718//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/718//console This message is automatically generated.
        Hide
        Mahadev konar added a comment -

        @Ben,
        Can you please take a look at this patch?

        Show
        Mahadev konar added a comment - @Ben, Can you please take a look at this patch?
        Hide
        Benjamin Reed added a comment -

        daniel, i'm ready to look at this. can you put it on reviews.apache.org?

        Show
        Benjamin Reed added a comment - daniel, i'm ready to look at this. can you put it on reviews.apache.org?
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3364/
        -----------------------------------------------------------

        Review request for zookeeper and Benjamin Reed.

        Summary
        -------

        Added APIs to remove watches that are not needed anymore. If the removal completes successfully it is guaranteed that the watcher won't be notified.

        With the current semantics if two clients remove watches on a znode at the same time the watch is triggered, one could remove it successfully while the other could receive the notification first.

        This addresses bug ZOOKEEPER-442.
        https://issues.apache.org/jira/browse/ZOOKEEPER-442

        Diffs


        src/java/main/org/apache/zookeeper/ClientCnxn.java 6c25e40
        src/java/main/org/apache/zookeeper/KeeperException.java 7c10d2c
        src/java/main/org/apache/zookeeper/Watcher.java 36c7b5b
        src/java/main/org/apache/zookeeper/ZooDefs.java c7f1b20
        src/java/main/org/apache/zookeeper/ZooKeeper.java e82eaa0
        src/java/main/org/apache/zookeeper/server/DataTree.java 757a572
        src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java 336827a
        src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java 1a80d74
        src/java/main/org/apache/zookeeper/server/Request.java c6a2249
        src/java/main/org/apache/zookeeper/server/WatchManager.java 0e7c815
        src/java/main/org/apache/zookeeper/server/ZKDatabase.java 2842803
        src/java/test/org/apache/zookeeper/test/RemoveWatchesTest.java PRE-CREATION
        src/zookeeper.jute d24e145

        Diff: https://reviews.apache.org/r/3364/diff

        Testing
        -------

        Added unit test that checks client side semantics.

        Thanks,

        Daniel

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3364/ ----------------------------------------------------------- Review request for zookeeper and Benjamin Reed. Summary ------- Added APIs to remove watches that are not needed anymore. If the removal completes successfully it is guaranteed that the watcher won't be notified. With the current semantics if two clients remove watches on a znode at the same time the watch is triggered, one could remove it successfully while the other could receive the notification first. This addresses bug ZOOKEEPER-442 . https://issues.apache.org/jira/browse/ZOOKEEPER-442 Diffs src/java/main/org/apache/zookeeper/ClientCnxn.java 6c25e40 src/java/main/org/apache/zookeeper/KeeperException.java 7c10d2c src/java/main/org/apache/zookeeper/Watcher.java 36c7b5b src/java/main/org/apache/zookeeper/ZooDefs.java c7f1b20 src/java/main/org/apache/zookeeper/ZooKeeper.java e82eaa0 src/java/main/org/apache/zookeeper/server/DataTree.java 757a572 src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java 336827a src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java 1a80d74 src/java/main/org/apache/zookeeper/server/Request.java c6a2249 src/java/main/org/apache/zookeeper/server/WatchManager.java 0e7c815 src/java/main/org/apache/zookeeper/server/ZKDatabase.java 2842803 src/java/test/org/apache/zookeeper/test/RemoveWatchesTest.java PRE-CREATION src/zookeeper.jute d24e145 Diff: https://reviews.apache.org/r/3364/diff Testing ------- Added unit test that checks client side semantics. Thanks, Daniel
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3364/#review4182
        -----------------------------------------------------------

        This is only a first pass. Looks ok but I would like more attention to anything you're not really sure about.

        src/java/main/org/apache/zookeeper/ZooKeeper.java
        <https://reviews.apache.org/r/3364/#comment9414>

        Maybe collapse this to if(watchers != null && watchers.contains(watcher)) ?

        src/java/main/org/apache/zookeeper/ZooKeeper.java
        <https://reviews.apache.org/r/3364/#comment9415>

        I don't think that you should override this method to do two things based on the variables passed in, it's confusing and unnecessary. Can we have instead two methods, one for removing all watchers and one for just removing a watch if it exists?

        src/java/main/org/apache/zookeeper/ZooKeeper.java
        <https://reviews.apache.org/r/3364/#comment9416>

        ??

        src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java
        <https://reviews.apache.org/r/3364/#comment9417>

        Let's not check in files that are just reordering of imports...

        • Camille

        On 2012-01-04 09:41:43, Daniel Gómez Ferro wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3364/

        -----------------------------------------------------------

        (Updated 2012-01-04 09:41:43)

        Review request for zookeeper and Benjamin Reed.

        Summary

        -------

        Added APIs to remove watches that are not needed anymore. If the removal completes successfully it is guaranteed that the watcher won't be notified.

        With the current semantics if two clients remove watches on a znode at the same time the watch is triggered, one could remove it successfully while the other could receive the notification first.

        This addresses bug ZOOKEEPER-442.

        https://issues.apache.org/jira/browse/ZOOKEEPER-442

        Diffs

        -----

        src/java/main/org/apache/zookeeper/ClientCnxn.java 6c25e40

        src/java/main/org/apache/zookeeper/KeeperException.java 7c10d2c

        src/java/main/org/apache/zookeeper/Watcher.java 36c7b5b

        src/java/main/org/apache/zookeeper/ZooDefs.java c7f1b20

        src/java/main/org/apache/zookeeper/ZooKeeper.java e82eaa0

        src/java/main/org/apache/zookeeper/server/DataTree.java 757a572

        src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java 336827a

        src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java 1a80d74

        src/java/main/org/apache/zookeeper/server/Request.java c6a2249

        src/java/main/org/apache/zookeeper/server/WatchManager.java 0e7c815

        src/java/main/org/apache/zookeeper/server/ZKDatabase.java 2842803

        src/java/test/org/apache/zookeeper/test/RemoveWatchesTest.java PRE-CREATION

        src/zookeeper.jute d24e145

        Diff: https://reviews.apache.org/r/3364/diff

        Testing

        -------

        Added unit test that checks client side semantics.

        Thanks,

        Daniel

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3364/#review4182 ----------------------------------------------------------- This is only a first pass. Looks ok but I would like more attention to anything you're not really sure about. src/java/main/org/apache/zookeeper/ZooKeeper.java < https://reviews.apache.org/r/3364/#comment9414 > Maybe collapse this to if(watchers != null && watchers.contains(watcher)) ? src/java/main/org/apache/zookeeper/ZooKeeper.java < https://reviews.apache.org/r/3364/#comment9415 > I don't think that you should override this method to do two things based on the variables passed in, it's confusing and unnecessary. Can we have instead two methods, one for removing all watchers and one for just removing a watch if it exists? src/java/main/org/apache/zookeeper/ZooKeeper.java < https://reviews.apache.org/r/3364/#comment9416 > ?? src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java < https://reviews.apache.org/r/3364/#comment9417 > Let's not check in files that are just reordering of imports... Camille On 2012-01-04 09:41:43, Daniel Gómez Ferro wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3364/ ----------------------------------------------------------- (Updated 2012-01-04 09:41:43) Review request for zookeeper and Benjamin Reed. Summary ------- Added APIs to remove watches that are not needed anymore. If the removal completes successfully it is guaranteed that the watcher won't be notified. With the current semantics if two clients remove watches on a znode at the same time the watch is triggered, one could remove it successfully while the other could receive the notification first. This addresses bug ZOOKEEPER-442 . https://issues.apache.org/jira/browse/ZOOKEEPER-442 Diffs ----- src/java/main/org/apache/zookeeper/ClientCnxn.java 6c25e40 src/java/main/org/apache/zookeeper/KeeperException.java 7c10d2c src/java/main/org/apache/zookeeper/Watcher.java 36c7b5b src/java/main/org/apache/zookeeper/ZooDefs.java c7f1b20 src/java/main/org/apache/zookeeper/ZooKeeper.java e82eaa0 src/java/main/org/apache/zookeeper/server/DataTree.java 757a572 src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java 336827a src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java 1a80d74 src/java/main/org/apache/zookeeper/server/Request.java c6a2249 src/java/main/org/apache/zookeeper/server/WatchManager.java 0e7c815 src/java/main/org/apache/zookeeper/server/ZKDatabase.java 2842803 src/java/test/org/apache/zookeeper/test/RemoveWatchesTest.java PRE-CREATION src/zookeeper.jute d24e145 Diff: https://reviews.apache.org/r/3364/diff Testing ------- Added unit test that checks client side semantics. Thanks, Daniel
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3364/#review4184
        -----------------------------------------------------------

        As part of this patch the programmer guide needs to be updated to detail how this feature works. In particular it should detail the semantics of watch removal, any "gotchas" to watch out for, etc...

        • Patrick

        On 2012-01-04 09:41:43, Daniel Gómez Ferro wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3364/

        -----------------------------------------------------------

        (Updated 2012-01-04 09:41:43)

        Review request for zookeeper and Benjamin Reed.

        Summary

        -------

        Added APIs to remove watches that are not needed anymore. If the removal completes successfully it is guaranteed that the watcher won't be notified.

        With the current semantics if two clients remove watches on a znode at the same time the watch is triggered, one could remove it successfully while the other could receive the notification first.

        This addresses bug ZOOKEEPER-442.

        https://issues.apache.org/jira/browse/ZOOKEEPER-442

        Diffs

        -----

        src/java/main/org/apache/zookeeper/ClientCnxn.java 6c25e40

        src/java/main/org/apache/zookeeper/KeeperException.java 7c10d2c

        src/java/main/org/apache/zookeeper/Watcher.java 36c7b5b

        src/java/main/org/apache/zookeeper/ZooDefs.java c7f1b20

        src/java/main/org/apache/zookeeper/ZooKeeper.java e82eaa0

        src/java/main/org/apache/zookeeper/server/DataTree.java 757a572

        src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java 336827a

        src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java 1a80d74

        src/java/main/org/apache/zookeeper/server/Request.java c6a2249

        src/java/main/org/apache/zookeeper/server/WatchManager.java 0e7c815

        src/java/main/org/apache/zookeeper/server/ZKDatabase.java 2842803

        src/java/test/org/apache/zookeeper/test/RemoveWatchesTest.java PRE-CREATION

        src/zookeeper.jute d24e145

        Diff: https://reviews.apache.org/r/3364/diff

        Testing

        -------

        Added unit test that checks client side semantics.

        Thanks,

        Daniel

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3364/#review4184 ----------------------------------------------------------- As part of this patch the programmer guide needs to be updated to detail how this feature works. In particular it should detail the semantics of watch removal, any "gotchas" to watch out for, etc... Patrick On 2012-01-04 09:41:43, Daniel Gómez Ferro wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3364/ ----------------------------------------------------------- (Updated 2012-01-04 09:41:43) Review request for zookeeper and Benjamin Reed. Summary ------- Added APIs to remove watches that are not needed anymore. If the removal completes successfully it is guaranteed that the watcher won't be notified. With the current semantics if two clients remove watches on a znode at the same time the watch is triggered, one could remove it successfully while the other could receive the notification first. This addresses bug ZOOKEEPER-442 . https://issues.apache.org/jira/browse/ZOOKEEPER-442 Diffs ----- src/java/main/org/apache/zookeeper/ClientCnxn.java 6c25e40 src/java/main/org/apache/zookeeper/KeeperException.java 7c10d2c src/java/main/org/apache/zookeeper/Watcher.java 36c7b5b src/java/main/org/apache/zookeeper/ZooDefs.java c7f1b20 src/java/main/org/apache/zookeeper/ZooKeeper.java e82eaa0 src/java/main/org/apache/zookeeper/server/DataTree.java 757a572 src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java 336827a src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java 1a80d74 src/java/main/org/apache/zookeeper/server/Request.java c6a2249 src/java/main/org/apache/zookeeper/server/WatchManager.java 0e7c815 src/java/main/org/apache/zookeeper/server/ZKDatabase.java 2842803 src/java/test/org/apache/zookeeper/test/RemoveWatchesTest.java PRE-CREATION src/zookeeper.jute d24e145 Diff: https://reviews.apache.org/r/3364/diff Testing ------- Added unit test that checks client side semantics. Thanks, Daniel
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3364/#review4185
        -----------------------------------------------------------

        src/java/main/org/apache/zookeeper/ZooKeeper.java
        <https://reviews.apache.org/r/3364/#comment9424>

        This class is getting rather large, it should be refactored into it's own file.

        src/java/main/org/apache/zookeeper/ZooKeeper.java
        <https://reviews.apache.org/r/3364/#comment9426>

        all watchers registered by this session, correct?

        src/java/main/org/apache/zookeeper/ZooKeeper.java
        <https://reviews.apache.org/r/3364/#comment9427>

        this is confusing, what does "leak" mean? It might be good to move this detail from here to the programmer guide.

        src/java/main/org/apache/zookeeper/ZooKeeper.java
        <https://reviews.apache.org/r/3364/#comment9428>

        @throws?

        src/java/main/org/apache/zookeeper/ZooKeeper.java
        <https://reviews.apache.org/r/3364/#comment9425>

        Update all the @since with expected release - looks like 3.5.0

        src/java/main/org/apache/zookeeper/ZooKeeper.java
        <https://reviews.apache.org/r/3364/#comment9429>

        why type vs having explicit methods?

        removeChildWatches
        removeDataWatches

        we're unlikely to add more watch types.

        On second thought though, shouldn't we have a "both" (or "alltypes", etc...) option as well? If that's the case perhaps 3 is enough to have a type...

        src/java/main/org/apache/zookeeper/ZooKeeper.java
        <https://reviews.apache.org/r/3364/#comment9432>

        since is incorrect

        src/java/main/org/apache/zookeeper/ZooKeeper.java
        <https://reviews.apache.org/r/3364/#comment9430>

        async operations don't throw keeperexception, see other async ops such as getData

        src/java/main/org/apache/zookeeper/ZooKeeper.java
        <https://reviews.apache.org/r/3364/#comment9431>

        see my comment on method signature - your intuition was correct.

        • Patrick

        On 2012-01-04 09:41:43, Daniel Gómez Ferro wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3364/

        -----------------------------------------------------------

        (Updated 2012-01-04 09:41:43)

        Review request for zookeeper and Benjamin Reed.

        Summary

        -------

        Added APIs to remove watches that are not needed anymore. If the removal completes successfully it is guaranteed that the watcher won't be notified.

        With the current semantics if two clients remove watches on a znode at the same time the watch is triggered, one could remove it successfully while the other could receive the notification first.

        This addresses bug ZOOKEEPER-442.

        https://issues.apache.org/jira/browse/ZOOKEEPER-442

        Diffs

        -----

        src/java/main/org/apache/zookeeper/ClientCnxn.java 6c25e40

        src/java/main/org/apache/zookeeper/KeeperException.java 7c10d2c

        src/java/main/org/apache/zookeeper/Watcher.java 36c7b5b

        src/java/main/org/apache/zookeeper/ZooDefs.java c7f1b20

        src/java/main/org/apache/zookeeper/ZooKeeper.java e82eaa0

        src/java/main/org/apache/zookeeper/server/DataTree.java 757a572

        src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java 336827a

        src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java 1a80d74

        src/java/main/org/apache/zookeeper/server/Request.java c6a2249

        src/java/main/org/apache/zookeeper/server/WatchManager.java 0e7c815

        src/java/main/org/apache/zookeeper/server/ZKDatabase.java 2842803

        src/java/test/org/apache/zookeeper/test/RemoveWatchesTest.java PRE-CREATION

        src/zookeeper.jute d24e145

        Diff: https://reviews.apache.org/r/3364/diff

        Testing

        -------

        Added unit test that checks client side semantics.

        Thanks,

        Daniel

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3364/#review4185 ----------------------------------------------------------- src/java/main/org/apache/zookeeper/ZooKeeper.java < https://reviews.apache.org/r/3364/#comment9424 > This class is getting rather large, it should be refactored into it's own file. src/java/main/org/apache/zookeeper/ZooKeeper.java < https://reviews.apache.org/r/3364/#comment9426 > all watchers registered by this session , correct? src/java/main/org/apache/zookeeper/ZooKeeper.java < https://reviews.apache.org/r/3364/#comment9427 > this is confusing, what does "leak" mean? It might be good to move this detail from here to the programmer guide. src/java/main/org/apache/zookeeper/ZooKeeper.java < https://reviews.apache.org/r/3364/#comment9428 > @throws? src/java/main/org/apache/zookeeper/ZooKeeper.java < https://reviews.apache.org/r/3364/#comment9425 > Update all the @since with expected release - looks like 3.5.0 src/java/main/org/apache/zookeeper/ZooKeeper.java < https://reviews.apache.org/r/3364/#comment9429 > why type vs having explicit methods? removeChildWatches removeDataWatches we're unlikely to add more watch types. On second thought though, shouldn't we have a "both" (or "alltypes", etc...) option as well? If that's the case perhaps 3 is enough to have a type... src/java/main/org/apache/zookeeper/ZooKeeper.java < https://reviews.apache.org/r/3364/#comment9432 > since is incorrect src/java/main/org/apache/zookeeper/ZooKeeper.java < https://reviews.apache.org/r/3364/#comment9430 > async operations don't throw keeperexception, see other async ops such as getData src/java/main/org/apache/zookeeper/ZooKeeper.java < https://reviews.apache.org/r/3364/#comment9431 > see my comment on method signature - your intuition was correct. Patrick On 2012-01-04 09:41:43, Daniel Gómez Ferro wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3364/ ----------------------------------------------------------- (Updated 2012-01-04 09:41:43) Review request for zookeeper and Benjamin Reed. Summary ------- Added APIs to remove watches that are not needed anymore. If the removal completes successfully it is guaranteed that the watcher won't be notified. With the current semantics if two clients remove watches on a znode at the same time the watch is triggered, one could remove it successfully while the other could receive the notification first. This addresses bug ZOOKEEPER-442 . https://issues.apache.org/jira/browse/ZOOKEEPER-442 Diffs ----- src/java/main/org/apache/zookeeper/ClientCnxn.java 6c25e40 src/java/main/org/apache/zookeeper/KeeperException.java 7c10d2c src/java/main/org/apache/zookeeper/Watcher.java 36c7b5b src/java/main/org/apache/zookeeper/ZooDefs.java c7f1b20 src/java/main/org/apache/zookeeper/ZooKeeper.java e82eaa0 src/java/main/org/apache/zookeeper/server/DataTree.java 757a572 src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java 336827a src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java 1a80d74 src/java/main/org/apache/zookeeper/server/Request.java c6a2249 src/java/main/org/apache/zookeeper/server/WatchManager.java 0e7c815 src/java/main/org/apache/zookeeper/server/ZKDatabase.java 2842803 src/java/test/org/apache/zookeeper/test/RemoveWatchesTest.java PRE-CREATION src/zookeeper.jute d24e145 Diff: https://reviews.apache.org/r/3364/diff Testing ------- Added unit test that checks client side semantics. Thanks, Daniel
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/3364/#review4186
        -----------------------------------------------------------

        src/java/main/org/apache/zookeeper/ZooKeeper.java
        <https://reviews.apache.org/r/3364/#comment9433>

        is it needed? would be good to be sure before we commit

        src/java/main/org/apache/zookeeper/ZooKeeper.java
        <https://reviews.apache.org/r/3364/#comment9435>

        it would be good to document here why we are calling the async version (that this is the only way to ensure that all pending watches have been processed before notifying that that watch was removed successfully).

        src/java/main/org/apache/zookeeper/ZooKeeper.java
        <https://reviews.apache.org/r/3364/#comment9434>

        ditto

        • Patrick

        On 2012-01-04 09:41:43, Daniel Gómez Ferro wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/3364/

        -----------------------------------------------------------

        (Updated 2012-01-04 09:41:43)

        Review request for zookeeper and Benjamin Reed.

        Summary

        -------

        Added APIs to remove watches that are not needed anymore. If the removal completes successfully it is guaranteed that the watcher won't be notified.

        With the current semantics if two clients remove watches on a znode at the same time the watch is triggered, one could remove it successfully while the other could receive the notification first.

        This addresses bug ZOOKEEPER-442.

        https://issues.apache.org/jira/browse/ZOOKEEPER-442

        Diffs

        -----

        src/java/main/org/apache/zookeeper/ClientCnxn.java 6c25e40

        src/java/main/org/apache/zookeeper/KeeperException.java 7c10d2c

        src/java/main/org/apache/zookeeper/Watcher.java 36c7b5b

        src/java/main/org/apache/zookeeper/ZooDefs.java c7f1b20

        src/java/main/org/apache/zookeeper/ZooKeeper.java e82eaa0

        src/java/main/org/apache/zookeeper/server/DataTree.java 757a572

        src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java 336827a

        src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java 1a80d74

        src/java/main/org/apache/zookeeper/server/Request.java c6a2249

        src/java/main/org/apache/zookeeper/server/WatchManager.java 0e7c815

        src/java/main/org/apache/zookeeper/server/ZKDatabase.java 2842803

        src/java/test/org/apache/zookeeper/test/RemoveWatchesTest.java PRE-CREATION

        src/zookeeper.jute d24e145

        Diff: https://reviews.apache.org/r/3364/diff

        Testing

        -------

        Added unit test that checks client side semantics.

        Thanks,

        Daniel

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3364/#review4186 ----------------------------------------------------------- src/java/main/org/apache/zookeeper/ZooKeeper.java < https://reviews.apache.org/r/3364/#comment9433 > is it needed? would be good to be sure before we commit src/java/main/org/apache/zookeeper/ZooKeeper.java < https://reviews.apache.org/r/3364/#comment9435 > it would be good to document here why we are calling the async version (that this is the only way to ensure that all pending watches have been processed before notifying that that watch was removed successfully). src/java/main/org/apache/zookeeper/ZooKeeper.java < https://reviews.apache.org/r/3364/#comment9434 > ditto Patrick On 2012-01-04 09:41:43, Daniel Gómez Ferro wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3364/ ----------------------------------------------------------- (Updated 2012-01-04 09:41:43) Review request for zookeeper and Benjamin Reed. Summary ------- Added APIs to remove watches that are not needed anymore. If the removal completes successfully it is guaranteed that the watcher won't be notified. With the current semantics if two clients remove watches on a znode at the same time the watch is triggered, one could remove it successfully while the other could receive the notification first. This addresses bug ZOOKEEPER-442 . https://issues.apache.org/jira/browse/ZOOKEEPER-442 Diffs ----- src/java/main/org/apache/zookeeper/ClientCnxn.java 6c25e40 src/java/main/org/apache/zookeeper/KeeperException.java 7c10d2c src/java/main/org/apache/zookeeper/Watcher.java 36c7b5b src/java/main/org/apache/zookeeper/ZooDefs.java c7f1b20 src/java/main/org/apache/zookeeper/ZooKeeper.java e82eaa0 src/java/main/org/apache/zookeeper/server/DataTree.java 757a572 src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java 336827a src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java 1a80d74 src/java/main/org/apache/zookeeper/server/Request.java c6a2249 src/java/main/org/apache/zookeeper/server/WatchManager.java 0e7c815 src/java/main/org/apache/zookeeper/server/ZKDatabase.java 2842803 src/java/test/org/apache/zookeeper/test/RemoveWatchesTest.java PRE-CREATION src/zookeeper.jute d24e145 Diff: https://reviews.apache.org/r/3364/diff Testing ------- Added unit test that checks client side semantics. Thanks, Daniel
        Hide
        Patrick Hunt added a comment -

        Cancelling pending patch update. Please update the patch, incl RB, and resubmit. thanks.

        Show
        Patrick Hunt added a comment - Cancelling pending patch update. Please update the patch, incl RB, and resubmit. thanks.
        Hide
        Flavio Junqueira added a comment -

        Sounds like an interesting feature if anyone is looking for a low-hanging fruit to contribute. Shall we revisit this?

        Show
        Flavio Junqueira added a comment - Sounds like an interesting feature if anyone is looking for a low-hanging fruit to contribute. Shall we revisit this?
        Hide
        Edward Ribeiro added a comment -

        I certainly have an interest on tackling this issue, but I need to read the patch and code to see if I am up to task already. As it's targeted to 3.5 this should give time to get up to speed, I guess.

        PS: But if any other person feels like trying to tackle it immediately then I can pass on the opportunity too.

        Show
        Edward Ribeiro added a comment - I certainly have an interest on tackling this issue, but I need to read the patch and code to see if I am up to task already. As it's targeted to 3.5 this should give time to get up to speed, I guess. PS: But if any other person feels like trying to tackle it immediately then I can pass on the opportunity too.
        Hide
        Mahadev konar added a comment -

        Edward Ribeiro if you are interested, feel free to take it up. I'd be happy to provide guidance/other help on this.

        Thanks

        Show
        Mahadev konar added a comment - Edward Ribeiro if you are interested, feel free to take it up. I'd be happy to provide guidance/other help on this. Thanks
        Hide
        Rakesh R added a comment -

        Hi Mahadev konar, Flavio Junqueira
        Its an interesting feature and would like to take it up. I've gone through the discussion thread and just prepared a draft doc which will help me to move on. Could you please have a look, I'll try to understand more about the existing patch and review comments.

        Show
        Rakesh R added a comment - Hi Mahadev konar , Flavio Junqueira Its an interesting feature and would like to take it up. I've gone through the discussion thread and just prepared a draft doc which will help me to move on. Could you please have a look, I'll try to understand more about the existing patch and review comments.
        Hide
        Mahadev konar added a comment -

        Thanks Rakesh. Good to see the initiative. Ill read through the doc and get back to you.

        Show
        Mahadev konar added a comment - Thanks Rakesh. Good to see the initiative. Ill read through the doc and get back to you.
        Hide
        Patrick Hunt added a comment -

        Rakesh R just to make sure I'm understanding correctly, the document you put together is based on the current approach/patch, is that right? Or are you suggesting a different implementation? (my quick perusal indicates the former but I want to make sure I'm not missing something)

        Show
        Patrick Hunt added a comment - Rakesh R just to make sure I'm understanding correctly, the document you put together is based on the current approach/patch, is that right? Or are you suggesting a different implementation? (my quick perusal indicates the former but I want to make sure I'm not missing something)
        Hide
        Rakesh R added a comment -

        Yes, I'm following the current approach with few improvements. Before starting I just prepared doc to understand the background and the proposed apis.
        I'll try to upload the working patch in couple of days.

        Show
        Rakesh R added a comment - Yes, I'm following the current approach with few improvements. Before starting I just prepared doc to understand the background and the proposed apis. I'll try to upload the working patch in couple of days.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12608778/ZOOKEEPER-442.patch
        against trunk revision 1532152.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1702//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1702//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1702//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608778/ZOOKEEPER-442.patch against trunk revision 1532152. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1702//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1702//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1702//console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12608862/ZOOKEEPER-442.patch
        against trunk revision 1532152.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1703//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1703//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1703//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608862/ZOOKEEPER-442.patch against trunk revision 1532152. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1703//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1703//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1703//console This message is automatically generated.
        Hide
        Edward Ribeiro added a comment -

        The offending javadoc warning in the latest patch:

        [exec] [javadoc] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src/java/main/org/apache/zookeeper/ZooKeeper.java:2326: warning - Tag @see: can't find removeWatches(String, Watcher, WatcherType, boolean) in org.apache.zookeeper.ZooKeeper

        The offending line seem to be this: @see #removeWatches(String, Watcher, WatcherType, boolean)

        Just a guess: wouldn't it be @see

        {@link #removeWatches(String, Watcher, WatcherType, boolean)}

        ?

        Edward

        Show
        Edward Ribeiro added a comment - The offending javadoc warning in the latest patch: [exec] [javadoc] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src/java/main/org/apache/zookeeper/ZooKeeper.java:2326: warning - Tag @see: can't find removeWatches(String, Watcher, WatcherType, boolean) in org.apache.zookeeper.ZooKeeper The offending line seem to be this: @see #removeWatches(String, Watcher, WatcherType, boolean) Just a guess: wouldn't it be @see {@link #removeWatches(String, Watcher, WatcherType, boolean)} ? Edward
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12608969/ZOOKEEPER-442.patch
        against trunk revision 1532152.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1704//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1704//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1704//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608969/ZOOKEEPER-442.patch against trunk revision 1532152. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1704//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1704//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1704//console This message is automatically generated.
        Hide
        Rakesh R added a comment -

        Edward Ribeiro I've just modified the patch and now it works fine.

        Show
        Rakesh R added a comment - Edward Ribeiro I've just modified the patch and now it works fine.
        Hide
        Rakesh R added a comment -

        I'd appreciate any suggestions or comments about the fix and test cases. Thanks

        Show
        Rakesh R added a comment - I'd appreciate any suggestions or comments about the fix and test cases. Thanks
        Hide
        Rakesh R added a comment -

        Please see review request : https://reviews.apache.org/r/15256/

        Show
        Rakesh R added a comment - Please see review request : https://reviews.apache.org/r/15256/
        Hide
        Patrick Hunt added a comment -

        I've seen this issue reported by a few users now (around the resourcing issue). I think we should prioritize inclusion for 3.5.0. I'll see if I can spend some cycles on it. Thanks Rakesh.

        One thing I notice, it would be nice to add support to the command line shell for removing watches. Makes it easier to do ad-hoc testing.

        Show
        Patrick Hunt added a comment - I've seen this issue reported by a few users now (around the resourcing issue). I think we should prioritize inclusion for 3.5.0. I'll see if I can spend some cycles on it. Thanks Rakesh. One thing I notice, it would be nice to add support to the command line shell for removing watches. Makes it easier to do ad-hoc testing.
        Hide
        Rakesh R added a comment -

        Patrick Hunt Thank you for the interest. I created separate JIRA ZOOKEEPER-1830 to discuss and implement the same. IMHO this will help everyone to make the patches simple and easy to implement/review.

        Show
        Rakesh R added a comment - Patrick Hunt Thank you for the interest. I created separate JIRA ZOOKEEPER-1830 to discuss and implement the same. IMHO this will help everyone to make the patches simple and easy to implement/review.
        Hide
        Patrick Hunt added a comment -

        Good idea. I' d suggest a similar sub-jira for doc changes. (add remove watches detail to the programmer guide?)

        Show
        Patrick Hunt added a comment - Good idea. I' d suggest a similar sub-jira for doc changes. (add remove watches detail to the programmer guide?)
        Hide
        Patrick Hunt added a comment -

        I'm planning to commit this soon, would be good to get more eyes on this patch as it's new functionality. Anyone? Speak now....

        Show
        Patrick Hunt added a comment - I'm planning to commit this soon, would be good to get more eyes on this patch as it's new functionality. Anyone? Speak now....
        Hide
        Raul Gutierrez Segales added a comment -

        In:

        +        private boolean removeWatches(Map<String, Set<Watcher>> watches,
        +                Watcher watcher, String path, boolean local, int rc,
        +                Set<Watcher> removedWatchers) {
        +            // Check whether watches list contains the given watcher
        +            boolean containsWatcher = contains(path, watcher, watches);
        +            if (!containsWatcher) {
        +                // Watcher not found for the given params
        +                return false;
        +            }
        +
        +            // Couldn't remove locally as it doesn't sees a success from server
        +            if (!local && watcher == null && rc != Code.OK.intValue()) {
        +                return containsWatcher;
        

        shouldn't it return false since it couldn't remove in the server? At that point containsWatcher is true...

        The next part looks wrong too:

        +            } else if (!local && rc != Code.OK.intValue()) {
        +                Set<Watcher> watchers = watches.get(path);
        +                // Couldn't remove locally as it doesn't sees a success from
        +                // server
        +                if (watchers.size() == 1) {
        +                    return containsWatcher;
        +                } else if (watchers.remove(watcher)) {
        +                    // found path watcher
        +                    removedWatchers.add(watcher);
        +                    return containsWatcher;
        +                }
        

        why would watchers.size() == 1 imply that it was that the watches were removed?

        Also, would it make since if they API also allowed to remove all watchers for a given path? Right now it's all or nothing:

        +    /**
        +     * For the given znode path, removes the specified watcher.
        +     * 
        +     * <p>
        +     * If watcher is null all watchers for the given watcherType will be
        +     * removed, Otherwise only the specified watcher corresponding to the
        +     * watcherType will be removed.
        +     * <p>
        

        I think in that case it should be for that path. Or, alternatively, if path is "" or null then go ahead and remove all watchers regardless of their path. It also makes the documentation more precise (it starts by saying "for a given znode path" and then it says it can actually remove all watchers regardless).

        The change would be minimal.

        Show
        Raul Gutierrez Segales added a comment - In: + private boolean removeWatches(Map<String, Set<Watcher>> watches, + Watcher watcher, String path, boolean local, int rc, + Set<Watcher> removedWatchers) { + // Check whether watches list contains the given watcher + boolean containsWatcher = contains(path, watcher, watches); + if (!containsWatcher) { + // Watcher not found for the given params + return false; + } + + // Couldn't remove locally as it doesn't sees a success from server + if (!local && watcher == null && rc != Code.OK.intValue()) { + return containsWatcher; shouldn't it return false since it couldn't remove in the server? At that point containsWatcher is true... The next part looks wrong too: + } else if (!local && rc != Code.OK.intValue()) { + Set<Watcher> watchers = watches.get(path); + // Couldn't remove locally as it doesn't sees a success from + // server + if (watchers.size() == 1) { + return containsWatcher; + } else if (watchers.remove(watcher)) { + // found path watcher + removedWatchers.add(watcher); + return containsWatcher; + } why would watchers.size() == 1 imply that it was that the watches were removed? Also, would it make since if they API also allowed to remove all watchers for a given path? Right now it's all or nothing: + /** + * For the given znode path, removes the specified watcher. + * + * <p> + * If watcher is null all watchers for the given watcherType will be + * removed, Otherwise only the specified watcher corresponding to the + * watcherType will be removed. + * <p> I think in that case it should be for that path . Or, alternatively, if path is "" or null then go ahead and remove all watchers regardless of their path. It also makes the documentation more precise (it starts by saying "for a given znode path" and then it says it can actually remove all watchers regardless). The change would be minimal.
        Hide
        Raul Gutierrez Segales added a comment -

        One more nit in private boolean removeWatches:

        +                if (watcher == null) {
        +                    Set<Watcher> pathWatchers = watches.remove(path);
        +                    if (pathWatchers != null) {
        +                        // found path watchers
        +                        removedWatchers.addAll(pathWatchers);
        +                    }
        +                } else {
        +                    Set<Watcher> watchers = watches.get(path);
        +                    if (watcher != null) {
        +                        if (watchers.remove(watcher)) {
        +                            // found path watcher
        +                            removedWatchers.add(watcher);
        +                        }
        +                    }
        +                }
        

        the "if (watcher != null) {" check is superfluous since we are already in the else block for watcher == null. It affects readability so I would drop it.

        Show
        Raul Gutierrez Segales added a comment - One more nit in private boolean removeWatches: + if (watcher == null) { + Set<Watcher> pathWatchers = watches.remove(path); + if (pathWatchers != null) { + // found path watchers + removedWatchers.addAll(pathWatchers); + } + } else { + Set<Watcher> watchers = watches.get(path); + if (watcher != null) { + if (watchers.remove(watcher)) { + // found path watcher + removedWatchers.add(watcher); + } + } + } the "if (watcher != null) {" check is superfluous since we are already in the else block for watcher == null. It affects readability so I would drop it.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12620761/ZOOKEEPER-442.patch
        against trunk revision 1553693.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1867//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1867//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1867//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12620761/ZOOKEEPER-442.patch against trunk revision 1553693. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1867//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1867//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1867//console This message is automatically generated.
        Hide
        Rakesh R added a comment -

        Thanks Raul Gutierrez Segales for the comments and your time. I've updated latest patch by addressing these comments. Could you please have a look at it.

        I've done the following changes in the patch:

        1. Modified ZKWatchManager#contains logic.
        2. Splitted the contains check & removal logic into separate methods for better readability.
        3. Done removal of the watcher after getting the server response. In previous patch it was removing and not sending to the server in few conditions. Now api will do basic contains check and send the request to the server, then waiting for the response. Upon receiving the response, it will go for watcher removal.
        4. Added one test case for chroot scenario
        5. Added few logs in the test cases, for marking the steps.
        Show
        Rakesh R added a comment - Thanks Raul Gutierrez Segales for the comments and your time. I've updated latest patch by addressing these comments. Could you please have a look at it. I've done the following changes in the patch: Modified ZKWatchManager#contains logic. Splitted the contains check & removal logic into separate methods for better readability. Done removal of the watcher after getting the server response. In previous patch it was removing and not sending to the server in few conditions. Now api will do basic contains check and send the request to the server, then waiting for the response. Upon receiving the response, it will go for watcher removal. Added one test case for chroot scenario Added few logs in the test cases, for marking the steps.
        Hide
        Raul Gutierrez Segales added a comment -

        Sorry for the late response, Rakesh R. Thanks for the update. Here are a few more comments:

        Perf nit in src/java/main/org/apache/zookeeper/ClientCnxn.java:

        +            Set<Watcher> watchers = new HashSet<Watcher>();
        +            if (materializedWatchers == null) {
        +                // materialize the watchers based on the event
        +                watchers = watcher.materialize(event.getState(),
        +                        event.getType(), event.getPath());
        +            } else {
        +                watchers.addAll(materializedWatchers);
        +            }
        +            WatcherSetEventPair pair = new WatcherSetEventPair(watchers, event);
        

        If materializedWatchers == null then calling watcher.materialize() will give you a new Set of watchers (overriding the previously instantiated HashSet). So you should only instantiate new HashSet<Watcher>() inside the else clause.

        In src/java/main/org/apache/zookeeper/ZooKeeper.java:

        +        public Map<EventType, Set<Watcher>> removeWatcher(String clientPath,
        +                Watcher watcher, WatcherType watcherType, boolean local, int rc)
        +                throws KeeperException {
        ....
        +            removedWatchers
        +                    .put(EventType.ChildWatchRemoved, childWatchersToRem);
        ....
        +            case Any: {
        +                removedWatchers.put(EventType.ChildWatchRemoved,
        +                        childWatchersToRem);
        

        The call to removedWatchers.put(EventType.ChildWatchRemoved, childWatchersToRem) inside the Any switch clause doesn't seem to be needed, no?

        Nit in:

        +    public void removeWatches(String path, Watcher watcher,
        +            WatcherType watcherType, boolean local)
        +            throws InterruptedException, KeeperException {
        +        final String clientPath = path;
        

        why is assigning to clientPath needed? In the previous comments the path param is referred as clientPath anyway, maybe renamed it?

        In:

        +        // Validating the existence of wacher, because there could be a case
        +        // where the watcher might be triggered just before receiving the remove
        +        // watch response. 
        +        try {
        +            watchManager.containsWatcher(clientPath, watcher, watcherType);
        +        } catch (NoWatcherException nwe) {
        +            LOG.error("Failed to find watcher!", nwe);
        +            cb.processResult(nwe.code().intValue(), clientPath, ctx);
        +            return;
        +        }
        

        I think we should just throw NoWatcherException otherwise how can the caller of this (async) method know if it failed to be scheduled? I think this is how other async methods behave when the remote call fails to be issued. They either raise an exception or return an error code instead of having their callback being called. The callback should only be called if the remote call was performed (so then it only gets server generated errors, if any).

        Besides these points, I am happy to give it a +1. Hope to see this merged soon! Awesome work.

        Show
        Raul Gutierrez Segales added a comment - Sorry for the late response, Rakesh R . Thanks for the update. Here are a few more comments: Perf nit in src/java/main/org/apache/zookeeper/ClientCnxn.java: + Set<Watcher> watchers = new HashSet<Watcher>(); + if (materializedWatchers == null) { + // materialize the watchers based on the event + watchers = watcher.materialize(event.getState(), + event.getType(), event.getPath()); + } else { + watchers.addAll(materializedWatchers); + } + WatcherSetEventPair pair = new WatcherSetEventPair(watchers, event); If materializedWatchers == null then calling watcher.materialize() will give you a new Set of watchers (overriding the previously instantiated HashSet). So you should only instantiate new HashSet<Watcher>() inside the else clause. In src/java/main/org/apache/zookeeper/ZooKeeper.java: + public Map<EventType, Set<Watcher>> removeWatcher(String clientPath, + Watcher watcher, WatcherType watcherType, boolean local, int rc) + throws KeeperException { .... + removedWatchers + .put(EventType.ChildWatchRemoved, childWatchersToRem); .... + case Any: { + removedWatchers.put(EventType.ChildWatchRemoved, + childWatchersToRem); The call to removedWatchers.put(EventType.ChildWatchRemoved, childWatchersToRem) inside the Any switch clause doesn't seem to be needed, no? Nit in: + public void removeWatches(String path, Watcher watcher, + WatcherType watcherType, boolean local) + throws InterruptedException, KeeperException { + final String clientPath = path; why is assigning to clientPath needed? In the previous comments the path param is referred as clientPath anyway, maybe renamed it? In: + // Validating the existence of wacher, because there could be a case + // where the watcher might be triggered just before receiving the remove + // watch response. + try { + watchManager.containsWatcher(clientPath, watcher, watcherType); + } catch (NoWatcherException nwe) { + LOG.error("Failed to find watcher!", nwe); + cb.processResult(nwe.code().intValue(), clientPath, ctx); + return; + } I think we should just throw NoWatcherException otherwise how can the caller of this (async) method know if it failed to be scheduled? I think this is how other async methods behave when the remote call fails to be issued. They either raise an exception or return an error code instead of having their callback being called. The callback should only be called if the remote call was performed (so then it only gets server generated errors, if any). Besides these points, I am happy to give it a +1. Hope to see this merged soon! Awesome work.
        Hide
        Rakesh R added a comment -

        Raul Gutierrez Segales thanks again for the review and comments.

        I think we should just throw NoWatcherException otherwise how can the caller of this (async) method know if it failed to be scheduled? I think this is how other async methods behave when the remote call fails to be issued. They either raise an exception or return an error code instead of having their callback being called. The callback should only be called if the remote call was performed (so then it only gets server generated errors, if any).

        In that case, here we should have throws KeeperException clause for the removeWatch async api. But I couldn't see any such async api which is throwing KeeperException. This validation is just to avoid server call and do fail-fast. I think this will not see as a deviation for the async behavior. If you agree with my point, I'll leave this part alone and fix other comments ?

        Show
        Rakesh R added a comment - Raul Gutierrez Segales thanks again for the review and comments. I think we should just throw NoWatcherException otherwise how can the caller of this (async) method know if it failed to be scheduled? I think this is how other async methods behave when the remote call fails to be issued. They either raise an exception or return an error code instead of having their callback being called. The callback should only be called if the remote call was performed (so then it only gets server generated errors, if any). In that case, here we should have throws KeeperException clause for the removeWatch async api. But I couldn't see any such async api which is throwing KeeperException. This validation is just to avoid server call and do fail-fast. I think this will not see as a deviation for the async behavior. If you agree with my point, I'll leave this part alone and fix other comments ?
        Hide
        Raul Gutierrez Segales added a comment -

        Thanks for the quick reply Rakesh R.

        Here's an asynchronous method that throws KeeperException:

        https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ZooKeeper.java#L1641

        I wonder what Patrick Hunt and Flavio Junqueira think. My expectation at least is that the callback doesn't get call when an async method fails before the RPC is issued. But you are right that currently it isn't, consistently, the case for all of the async methods in the client API.

        Show
        Raul Gutierrez Segales added a comment - Thanks for the quick reply Rakesh R . Here's an asynchronous method that throws KeeperException: https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ZooKeeper.java#L1641 I wonder what Patrick Hunt and Flavio Junqueira think. My expectation at least is that the callback doesn't get call when an async method fails before the RPC is issued. But you are right that currently it isn't, consistently, the case for all of the async methods in the client API.
        Hide
        Camille Fournier added a comment -

        That method shouldn't throw either of those exceptions, as none of the things it is calling will throw any of those exceptions. So reconfig is in error here.

        Show
        Camille Fournier added a comment - That method shouldn't throw either of those exceptions, as none of the things it is calling will throw any of those exceptions. So reconfig is in error here.
        Hide
        Raul Gutierrez Segales added a comment -

        Ah, good catch Camille Fournier - mind filing a bug for that?

        Though it still feels weird that the callback would be call from within the async dispatcher method. A bit tangential but the C implementation does error out early when it can't dispatch async requests (as opposed to calling the callback), i.e.:

        https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L3063

        I guess it is also true that this is the first async method that actually does some real client side validation before issuing the call? It will probably feel more natural for the caller to know right away that the call wasn't dispatched, as opposed to be informed through the callback.

        But this is minor so if everyone else is happy it would be very nice to see this merged!

        Show
        Raul Gutierrez Segales added a comment - Ah, good catch Camille Fournier - mind filing a bug for that? Though it still feels weird that the callback would be call from within the async dispatcher method. A bit tangential but the C implementation does error out early when it can't dispatch async requests (as opposed to calling the callback), i.e.: https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L3063 I guess it is also true that this is the first async method that actually does some real client side validation before issuing the call? It will probably feel more natural for the caller to know right away that the call wasn't dispatched, as opposed to be informed through the callback. But this is minor so if everyone else is happy it would be very nice to see this merged!
        Hide
        Patrick Hunt added a comment -

        So reconfig is in error here.

        Looks like it to me too. I don't see any other async calls using it. Must have sneaked in with reconfig by accident.

        I'm back from vacation now, so I'll try driving this again.

        Based on the most recent comments are there any other changes pending? Or should we consider this ready?

        Show
        Patrick Hunt added a comment - So reconfig is in error here. Looks like it to me too. I don't see any other async calls using it. Must have sneaked in with reconfig by accident. I'm back from vacation now, so I'll try driving this again. Based on the most recent comments are there any other changes pending? Or should we consider this ready?
        Hide
        Patrick Hunt added a comment -

        FYI: while I was on vacation I had this patch (well an earlier one) running continuously on my personal CI boxes and I didn't see any issue attributable to these changes. So from a units (new and existing) it looks pretty solid to me.

        I'll update my setup to this latest patch now.

        Show
        Patrick Hunt added a comment - FYI: while I was on vacation I had this patch (well an earlier one) running continuously on my personal CI boxes and I didn't see any issue attributable to these changes. So from a units (new and existing) it looks pretty solid to me. I'll update my setup to this latest patch now.
        Hide
        Rakesh R added a comment -

        Thanks everyone for the comments

        Ah its really good finding. Please file a JIRA for reconfig correction. This feature is marked for 3.4.5 release, I feel its the right time to discuss and come to a conclusion.

        Hi Patrick Hunt, I'll upload latest patch addressing Raul Gutierrez Segales comments.

        Show
        Rakesh R added a comment - Thanks everyone for the comments Ah its really good finding. Please file a JIRA for reconfig correction. This feature is marked for 3.4.5 release, I feel its the right time to discuss and come to a conclusion. Hi Patrick Hunt , I'll upload latest patch addressing Raul Gutierrez Segales comments.
        Hide
        Rakesh R added a comment -

        This feature is marked for 3.4.5 release

        Please excuse, small correction to my previous comment. Dynamic reconfig feature ZOOKEEPER-107 is marked for 3.5.0 release

        Show
        Rakesh R added a comment - This feature is marked for 3.4.5 release Please excuse, small correction to my previous comment. Dynamic reconfig feature ZOOKEEPER-107 is marked for 3.5.0 release
        Hide
        Rakesh R added a comment -

        why is assigning to clientPath needed? In the previous comments the path param is referred as clientPath anyway, maybe renamed it?

        I modified the javadoc path param and now in sync with other api implementation.

        Attached latest patch addressing Raul Gutierrez Segales comments. Please have a look at the patch.

        Show
        Rakesh R added a comment - why is assigning to clientPath needed? In the previous comments the path param is referred as clientPath anyway, maybe renamed it? I modified the javadoc path param and now in sync with other api implementation. Attached latest patch addressing Raul Gutierrez Segales comments. Please have a look at the patch.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12622496/ZOOKEEPER-442.patch
        against trunk revision 1556976.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1875//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1875//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1875//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622496/ZOOKEEPER-442.patch against trunk revision 1556976. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1875//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1875//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1875//console This message is automatically generated.
        Hide
        Patrick Hunt added a comment -

        Ok. I have this running in a continuous loop on my CI env.

        Show
        Patrick Hunt added a comment - Ok. I have this running in a continuous loop on my CI env.
        Hide
        Raul Gutierrez Segales added a comment -

        Created ZOOKEEPER-1860 to cleanup reconfig.

        Show
        Raul Gutierrez Segales added a comment - Created ZOOKEEPER-1860 to cleanup reconfig.
        Hide
        Patrick Hunt added a comment -

        I'm afraid something is not quite right. I created a very simple sanity check which seems to be failing. Please check my logic/code (note: I have some time this week and I'd like to keep the momentum going on this one.)

        In the first case the code I created will create a watch by calling exists on a non-existent node (call it "pre-removewatches"). It does this in a loop. I expect the memory to become exhausted on the client. In the second case I immediately call "removewatch" on the just-set-watch. In the second case I expect that client to run forever (essentially). However in both cases the clients crash. I am using the latest patch applied to trunk codebase for this testing. (notice the second case does run a bit longer than the first, but still, based on the code I have it should run ess. forever):

        first case:

        USE_REMOVE_CALL:false
        maxIter:100000
        maxLoops:10000
        Available free mem is 14614744
        j:0
        Available free mem is 12184072
        j:1
        Available free mem is 8827120
        j:2
        Available free mem is 6141912
        j:3
        Available free mem is 2600160
        j:4
        <OOMException>
        

        second case:

        USE_REMOVE_CALL:true
        maxIter:100000
        maxLoops:10000
        Available free mem is 14613488
        j:0
        Available free mem is 12497584
        j:1
        Available free mem is 9470512
        j:2
        Available free mem is 7167880
        j:3
        Available free mem is 3841616
        j:4
        Available free mem is 854728
        j:5
        <OOMException>
        

        I've posted my code to github, it's pretty simple. see:
        https://github.com/phunt/zookeeper-removewatch-ex/blob/master/src/test/java/org/phunt/zookeeper/removewatchexample/ExerciseRemoveWatchTest.java

        Notice the boolean USE_REMOVE_CALL at the top of the class. You can replicate the two output scenarios above by setting this to false (no remove) and true (call remove). Then running the test via "mvn test" at the toplevel. (note you need to install the 3.5.0-SNAPSHOT artifacts in your local maven cache, you can do this by running "ant package" in ZK and copying the resulting artifacts from the build subdir dist-maven directory into your .m2 repository)

        Does my code make sense? It seems to me that given we're calling exists (on non-existent node), setting a watch, then immediately removing the watch, that the code should run w/o running out of memory.

        I suspect the issue is that we are setting up some data structures to hold the watch information, but not fully clearing that data when removing the watch

        any ideas?

        Show
        Patrick Hunt added a comment - I'm afraid something is not quite right. I created a very simple sanity check which seems to be failing. Please check my logic/code (note: I have some time this week and I'd like to keep the momentum going on this one.) In the first case the code I created will create a watch by calling exists on a non-existent node (call it "pre-removewatches"). It does this in a loop. I expect the memory to become exhausted on the client. In the second case I immediately call "removewatch" on the just-set-watch. In the second case I expect that client to run forever (essentially). However in both cases the clients crash. I am using the latest patch applied to trunk codebase for this testing. (notice the second case does run a bit longer than the first, but still, based on the code I have it should run ess. forever): first case: USE_REMOVE_CALL:false maxIter:100000 maxLoops:10000 Available free mem is 14614744 j:0 Available free mem is 12184072 j:1 Available free mem is 8827120 j:2 Available free mem is 6141912 j:3 Available free mem is 2600160 j:4 <OOMException> second case: USE_REMOVE_CALL:true maxIter:100000 maxLoops:10000 Available free mem is 14613488 j:0 Available free mem is 12497584 j:1 Available free mem is 9470512 j:2 Available free mem is 7167880 j:3 Available free mem is 3841616 j:4 Available free mem is 854728 j:5 <OOMException> I've posted my code to github, it's pretty simple. see: https://github.com/phunt/zookeeper-removewatch-ex/blob/master/src/test/java/org/phunt/zookeeper/removewatchexample/ExerciseRemoveWatchTest.java Notice the boolean USE_REMOVE_CALL at the top of the class. You can replicate the two output scenarios above by setting this to false (no remove) and true (call remove). Then running the test via "mvn test" at the toplevel. (note you need to install the 3.5.0-SNAPSHOT artifacts in your local maven cache, you can do this by running "ant package" in ZK and copying the resulting artifacts from the build subdir dist-maven directory into your .m2 repository) Does my code make sense? It seems to me that given we're calling exists (on non-existent node), setting a watch, then immediately removing the watch, that the code should run w/o running out of memory. I suspect the issue is that we are setting up some data structures to hold the watch information, but not fully clearing that data when removing the watch any ideas?
        Hide
        Rakesh R added a comment -

        Thanks Patrick Hunt for the scenario. Looks like test code is fine and I will try exeucte the same in my env.

        Show
        Rakesh R added a comment - Thanks Patrick Hunt for the scenario. Looks like test code is fine and I will try exeucte the same in my env.
        Hide
        Rakesh R added a comment -

        Awsome test.

        I think I found one leak, where the map is growing even after the watcher removal, like <"/0000000000/e-0000000000-0000000000", []> with empty watcher list.

        In the test case, exists call is keep on adding pre-watchers to the newly formed non-existed path as shown below:
        /0000000000/e-0000000000-0000000000
        /0000000000/e-0000000000-0000000001
        /0000000000/e-0000000000-0000000002
        /0000000000/e-0000000000-0000000003
        /0000000000/e-0000000000-0000000004
        /0000000000/e-0000000000-0000000005

        private final Map<String, Set<Watcher>> existWatches = new HashMap<String, Set<Watcher>>();
        When I observe the 'existWatches' datastructure, on every zk.exists call, it is adding one entry like:

        {/0000000000/e-0000000000-0000000000=[org.apache.zookeeper.ExerciseRemoveWatchTest$IgnoringWatcher@10e4fd7]}

        While removing I had missed one condition where I removed the watcher and but leaving an entry in the map with empty watcher list. This is growing and is causing the trouble.

        {/0000000000/e-0000000000-0000000000=[]} {/0000000000/e-0000000000-0000000001=[]} {/0000000000/e-0000000000-0000000002=[]}
        I have to do the map cleanup when 'watchers.size() <= 0' in ZKWatchManager#removeWatches
                            Set<Watcher> watchers = pathVsWatcher.get(path);
                            if (watchers != null) {
                                if (watchers.remove(watcher)) {
                                    // found path watcher
                                    removedWatchers.add(watcher);
                                    // cleanup <path vs watchlist>
                                    if (watchers.size() <= 0) {
                                        pathVsWatcher.remove(path);
                                    }
                                    success = true;
                                }
                            }
        
        Show
        Rakesh R added a comment - Awsome test. I think I found one leak, where the map is growing even after the watcher removal, like <"/0000000000/e-0000000000-0000000000", []> with empty watcher list. In the test case, exists call is keep on adding pre-watchers to the newly formed non-existed path as shown below: /0000000000/e-0000000000-0000000000 /0000000000/e-0000000000-0000000001 /0000000000/e-0000000000-0000000002 /0000000000/e-0000000000-0000000003 /0000000000/e-0000000000-0000000004 /0000000000/e-0000000000-0000000005 private final Map<String, Set<Watcher>> existWatches = new HashMap<String, Set<Watcher>>(); When I observe the 'existWatches' datastructure, on every zk.exists call, it is adding one entry like: {/0000000000/e-0000000000-0000000000=[org.apache.zookeeper.ExerciseRemoveWatchTest$IgnoringWatcher@10e4fd7]} While removing I had missed one condition where I removed the watcher and but leaving an entry in the map with empty watcher list. This is growing and is causing the trouble. {/0000000000/e-0000000000-0000000000=[]} {/0000000000/e-0000000000-0000000001=[]} {/0000000000/e-0000000000-0000000002=[]} I have to do the map cleanup when 'watchers.size() <= 0' in ZKWatchManager#removeWatches Set<Watcher> watchers = pathVsWatcher.get(path); if (watchers != null ) { if (watchers.remove(watcher)) { // found path watcher removedWatchers.add(watcher); // cleanup <path vs watchlist> if (watchers.size() <= 0) { pathVsWatcher.remove(path); } success = true ; } }
        Hide
        Patrick Hunt added a comment -

        We just need to be careful that the concurrency control is such that no one can have one of the "watchers" objects in hand, about to add an entry to it, while you are checking the size. (I haven't checked for this, I'm just saying we need to verify this can't happen)

        Show
        Patrick Hunt added a comment - We just need to be careful that the concurrency control is such that no one can have one of the "watchers" objects in hand, about to add an entry to it, while you are checking the size. (I haven't checked for this, I'm just saying we need to verify this can't happen)
        Hide
        Rakesh R added a comment -

        Hi Patrick Hunt, could you please look at the latest patch. I ran the same test case in my env and seems ok till now.

        Show
        Rakesh R added a comment - Hi Patrick Hunt , could you please look at the latest patch. I ran the same test case in my env and seems ok till now.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12623108/ZOOKEEPER-442.patch
        against trunk revision 1557875.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1886//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1886//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1886//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623108/ZOOKEEPER-442.patch against trunk revision 1557875. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1886//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1886//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1886//console This message is automatically generated.
        Hide
        Rakesh R added a comment -

        I couldn't see failures in the previous build.

        0 failures (±0)
        

        Attached new patch where I reduced timeout to improve test execution time.

        Show
        Rakesh R added a comment - I couldn't see failures in the previous build. 0 failures (±0) Attached new patch where I reduced timeout to improve test execution time.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12623135/ZOOKEEPER-442.patch
        against trunk revision 1557875.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1888//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1888//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1888//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623135/ZOOKEEPER-442.patch against trunk revision 1557875. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1888//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1888//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1888//console This message is automatically generated.
        Hide
        Patrick Hunt added a comment -

        Looks like the failure was on the c tests (see the console, unfortunately jenkins doesn't know how to report this in the test report page)

             [exec]      [exec] Zookeeper_watchers::testChildWatcher1 : elapsed 105 : OK
             [exec]      [exec] Zookeeper_watchers::testChildWatcher2 : elapsed 54 : OK
             [exec]      [exec] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src/c/tests/TestReconfig.cc:183: Assertion: equality assertion failed [Expected: 1, Actual  : 0]
             [exec]      [exec] Failures !!!
        

        I don't think it's related to this though.

        Show
        Patrick Hunt added a comment - Looks like the failure was on the c tests (see the console, unfortunately jenkins doesn't know how to report this in the test report page) [exec] [exec] Zookeeper_watchers::testChildWatcher1 : elapsed 105 : OK [exec] [exec] Zookeeper_watchers::testChildWatcher2 : elapsed 54 : OK [exec] [exec] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src/c/tests/TestReconfig.cc:183: Assertion: equality assertion failed [Expected: 1, Actual : 0] [exec] [exec] Failures !!! I don't think it's related to this though.
        Hide
        Raul Gutierrez Segales added a comment -

        Sorry guys to distract you from fixing the leak and the tests but there's one small code smell (imho) in the last version of the patch:

        +                            if (watchers.size() <= 0) {
        +                                pathVsWatcher.remove(path);
        +                            }
        

        watchers.size() == 0 should suffice, if (as pointed out by Patrick) all our access around watchers is synchronized. We don't see to be enforcing that so we might have some races around that.

        Show
        Raul Gutierrez Segales added a comment - Sorry guys to distract you from fixing the leak and the tests but there's one small code smell (imho) in the last version of the patch: + if (watchers.size() <= 0) { + pathVsWatcher.remove(path); + } watchers.size() == 0 should suffice, if (as pointed out by Patrick) all our access around watchers is synchronized. We don't see to be enforcing that so we might have some races around that.
        Hide
        Patrick Hunt added a comment -

        Ok, thanks Rakesh. I have the latest running on my CI test harness again.

        Show
        Patrick Hunt added a comment - Ok, thanks Rakesh. I have the latest running on my CI test harness again.
        Hide
        Patrick Hunt added a comment -

        The latest patch looks like it fixed the issue I client mem issue I mentioned:

        USE_REMOVE_CALL:true
        maxIter:100000
        maxLoops:10000
        Available free mem is 14615920
        j:0
        Available free mem is 15589384
        j:1
        Available free mem is 15582608
        j:2
        Available free mem is 15589824
        j:3
        Available free mem is 15565192
        j:4
        Available free mem is 15589824
        j:5
        Available free mem is 15589824
        j:6
        Available free mem is 15589824
        j:7
        Available free mem is 15581544
        j:8
        Available free mem is 15589824
        j:9
        Available free mem is 15589824
        j:10
        Available free mem is 15589824
        j:11
        

        Nice!

        Show
        Patrick Hunt added a comment - The latest patch looks like it fixed the issue I client mem issue I mentioned: USE_REMOVE_CALL:true maxIter:100000 maxLoops:10000 Available free mem is 14615920 j:0 Available free mem is 15589384 j:1 Available free mem is 15582608 j:2 Available free mem is 15589824 j:3 Available free mem is 15565192 j:4 Available free mem is 15589824 j:5 Available free mem is 15589824 j:6 Available free mem is 15589824 j:7 Available free mem is 15581544 j:8 Available free mem is 15589824 j:9 Available free mem is 15589824 j:10 Available free mem is 15589824 j:11 Nice!
        Hide
        Patrick Hunt added a comment -

        Please feel free to update the patch.

        Can you both (Rakesh/Raul) also code review the synchronization around that code?

        Show
        Patrick Hunt added a comment - Please feel free to update the patch. Can you both (Rakesh/Raul) also code review the synchronization around that code?
        Hide
        Patrick Hunt added a comment -

        I also tried running the same experiment while monitoring the free memory of the ZK Server. I started a standalone zk server with 15m max/initial memory and used jconsole to monitor heap usage while running the zookeeper-removewatches-ex client. With the remove watch call the heap stayed flat over time, 30 minutes or so, after an initial warm up period. (w/o the remove call the server quickly fails with oom exception)

        Today I plan to add a few more types of remove call scenarios to my simple client and verify things are handled properly, however things look good at the moment. I'll try and see if I can dig up my copy of yourkit as well.

        Only other thing I see left is to finalize the code review feedback (I hope to do additional code review later today). Anyone else that can help with this would be appreciated.

        Show
        Patrick Hunt added a comment - I also tried running the same experiment while monitoring the free memory of the ZK Server. I started a standalone zk server with 15m max/initial memory and used jconsole to monitor heap usage while running the zookeeper-removewatches-ex client. With the remove watch call the heap stayed flat over time, 30 minutes or so, after an initial warm up period. (w/o the remove call the server quickly fails with oom exception) Today I plan to add a few more types of remove call scenarios to my simple client and verify things are handled properly, however things look good at the moment. I'll try and see if I can dig up my copy of yourkit as well. Only other thing I see left is to finalize the code review feedback (I hope to do additional code review later today). Anyone else that can help with this would be appreciated.
        Hide
        Rakesh R added a comment -

        Thanks again Patrick Hunt, Raul Gutierrez Segales for the great help.

        Sure, I'll once again go through the code closely and see the synchronizations/improvements(if any).

        watchers.size() == 0 should suffice, if (as pointed out by Patrick) all our access around watchers is synchronized. We don't see to be enforcing that so we might have some races around that.

        Hi Raul, Are you pointing me to see the synchronization of the newly added code.
        The new logic included to fix the leak is already inside the synchronization block as shown below, so this part of the code is safer.

        synchronized (existWatches) {
        +                    boolean removedDataWatcher = removeWatches(existWatches,
        +                            watcher, clientPath, local, rc, dataWatchersToRem);
        +                    removedWatcher |= removedDataWatcher;
        +                }
        
        Show
        Rakesh R added a comment - Thanks again Patrick Hunt , Raul Gutierrez Segales for the great help. Sure, I'll once again go through the code closely and see the synchronizations/improvements(if any). watchers.size() == 0 should suffice, if (as pointed out by Patrick) all our access around watchers is synchronized. We don't see to be enforcing that so we might have some races around that. Hi Raul, Are you pointing me to see the synchronization of the newly added code. The new logic included to fix the leak is already inside the synchronization block as shown below, so this part of the code is safer. synchronized (existWatches) { + boolean removedDataWatcher = removeWatches(existWatches, + watcher, clientPath, local, rc, dataWatchersToRem); + removedWatcher |= removedDataWatcher; + }
        Hide
        Patrick Hunt added a comment -

        Rakesh R I think that Raul's point was/is that checking for less than zero is not necessary. My point was just a note for reviewers - that we need to ensure the concurrency is correct. (and I hear that you are saying that it is/should be)

        FYI my CI environment has been running all day with the latest patch, no issues so far.

        Show
        Patrick Hunt added a comment - Rakesh R I think that Raul's point was/is that checking for less than zero is not necessary. My point was just a note for reviewers - that we need to ensure the concurrency is correct. (and I hear that you are saying that it is/should be) FYI my CI environment has been running all day with the latest patch, no issues so far.
        Hide
        Raul Gutierrez Segales added a comment -

        I read this again and I am happy with this in terms of handling synchronization, so +1. Nice work.

        Show
        Raul Gutierrez Segales added a comment - I read this again and I am happy with this in terms of handling synchronization, so +1. Nice work.
        Hide
        Patrick Hunt added a comment -

        My code review looks good. Great job - including the number/scope of the tests!

        Actually the tests is where I saw my only nit feedback, the user of small values for SESSION_TIME_OUT and TIME_OUT will result in similar issues as I recently raised re slow/underprovisioned hardware. I'd suggest you increase these (see the existing tests, they typically use org.apache.zookeeper.test.ClientBase.CONNECTION_TIMEOUT).

        If you submit a patch to address the timeout I'm +1 and ready to commit.

        Show
        Patrick Hunt added a comment - My code review looks good. Great job - including the number/scope of the tests! Actually the tests is where I saw my only nit feedback, the user of small values for SESSION_TIME_OUT and TIME_OUT will result in similar issues as I recently raised re slow/underprovisioned hardware. I'd suggest you increase these (see the existing tests, they typically use org.apache.zookeeper.test.ClientBase.CONNECTION_TIMEOUT). If you submit a patch to address the timeout I'm +1 and ready to commit.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12623890/ZOOKEEPER-442.patch
        against trunk revision 1558950.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1893//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1893//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1893//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12623890/ZOOKEEPER-442.patch against trunk revision 1558950. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1893//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1893//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1893//console This message is automatically generated.
        Hide
        Rakesh R added a comment -

        Sorry for the delay, I was travelling last few days.

        Thanks Raul/Patrick for the reviews. I have updated the patch by using the CONNECTION_TIMEOUT.

        TIME_OUT is operation timeout and now in latest patch it is CONNECTION_TIMEOUT/5.

        How the operation timeout is being used in the tests:
        For example:
        case #1 : Say user is expecting to remove only WatcherType.Children type. Now will wait for CONNECTION_TIMEOUT/5 to see the result of the removal operation.
        case #2 : Say user is expecting to remove only WatcherType.Children type. Now after seeing successful removal of child watches, test is waiting for CONNECTION_TIMEOUT/5. After timeout test will verify that whether the call didn't touch any of the Data/Exists watches.

        If I increase this to a high value will affect the test execution time. Are you agreeing to keep operation timeout as CONNECTION_TIMEOUT/5 milliseconds?

        Show
        Rakesh R added a comment - Sorry for the delay, I was travelling last few days. Thanks Raul/Patrick for the reviews. I have updated the patch by using the CONNECTION_TIMEOUT. TIME_OUT is operation timeout and now in latest patch it is CONNECTION_TIMEOUT/5. How the operation timeout is being used in the tests: For example: case #1 : Say user is expecting to remove only WatcherType.Children type. Now will wait for CONNECTION_TIMEOUT/5 to see the result of the removal operation. case #2 : Say user is expecting to remove only WatcherType.Children type. Now after seeing successful removal of child watches, test is waiting for CONNECTION_TIMEOUT/5. After timeout test will verify that whether the call didn't touch any of the Data/Exists watches. If I increase this to a high value will affect the test execution time. Are you agreeing to keep operation timeout as CONNECTION_TIMEOUT/5 milliseconds?
        Hide
        Patrick Hunt added a comment -

        Are you agreeing to keep operation timeout as CONNECTION_TIMEOUT/5 milliseconds?

        sounds reasonable to me. Given you are using TIME_OUT in this way, perhaps you should use a lower value? On slow machines you might miss something, but generally you won't (regular speed machines you'd catch any problems). I think this is an OK tradeoff if you want to do that.

        Show
        Patrick Hunt added a comment - Are you agreeing to keep operation timeout as CONNECTION_TIMEOUT/5 milliseconds? sounds reasonable to me. Given you are using TIME_OUT in this way, perhaps you should use a lower value? On slow machines you might miss something, but generally you won't (regular speed machines you'd catch any problems). I think this is an OK tradeoff if you want to do that.
        Hide
        Rakesh R added a comment -

        yeah I understood it and I'll keep an eye on the tests. Hope you are OK with the latest patch.

        Show
        Rakesh R added a comment - yeah I understood it and I'll keep an eye on the tests. Hope you are OK with the latest patch.
        Hide
        Patrick Hunt added a comment -

        Committed to trunk. Thanks Rakesh, Daniel, Raul, et. al. Great new feature!

        Show
        Patrick Hunt added a comment - Committed to trunk. Thanks Rakesh, Daniel, Raul, et. al. Great new feature!
        Hide
        Rakesh R added a comment -

        Happy to see it. Thanks a lot Patrick, Daniel, Raul, Camille, Ribeiro, et. al for the good support

        Show
        Rakesh R added a comment - Happy to see it. Thanks a lot Patrick, Daniel, Raul, Camille, Ribeiro, et. al for the good support
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in ZooKeeper-trunk #2195 (See https://builds.apache.org/job/ZooKeeper-trunk/2195/)
        ZOOKEEPER-442. need a way to remove watches that are no longer of interest (Rakesh R, Daniel Gómez Ferro via phunt) (phunt: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560904)

        • /zookeeper/trunk/CHANGES.txt
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/KeeperException.java
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/WatchDeregistration.java
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/Watcher.java
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/ZooDefs.java
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/ZooKeeper.java
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/DataTree.java
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/Request.java
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/WatchManager.java
        • /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZKDatabase.java
        • /zookeeper/trunk/src/java/test/org/apache/zookeeper/RemoveWatchesTest.java
        • /zookeeper/trunk/src/zookeeper.jute
        Show
        Hudson added a comment - SUCCESS: Integrated in ZooKeeper-trunk #2195 (See https://builds.apache.org/job/ZooKeeper-trunk/2195/ ) ZOOKEEPER-442 . need a way to remove watches that are no longer of interest (Rakesh R, Daniel Gómez Ferro via phunt) (phunt: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1560904 ) /zookeeper/trunk/CHANGES.txt /zookeeper/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/KeeperException.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/WatchDeregistration.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/Watcher.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/ZooDefs.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/ZooKeeper.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/DataTree.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/Request.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/WatchManager.java /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/ZKDatabase.java /zookeeper/trunk/src/java/test/org/apache/zookeeper/RemoveWatchesTest.java /zookeeper/trunk/src/zookeeper.jute

          People

          • Assignee:
            Rakesh R
            Reporter:
            Benjamin Reed
          • Votes:
            10 Vote for this issue
            Watchers:
            21 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development