I realize that synchronization will fix this if we assume only one process and one object in that process access this resource. But simply catching and ignoring no-node exception will also fix the problem w/o those assumptions. Synchronization is great when the resource being protected is private process memory, however thats not true in this case. ZooKeeper is a cluster wide resource and its possible that any other process in the cluster could mutate zookeeper at any time. The way I see there are at least three options to solve this problem.
- use java synchronization with assumptions stated above
- use zookeeper primitives for dealing with concurrency
- use java synchronization and zookeeper primitives for dealing with concurrency
I am in favor of #2. And its also very simple in this case, just ignore NoNodeException because it indicates the node was deleted after the call to getChildren() was made.
I've made post() also synchronized so that getList() doesn't miss any dead servers that get added after zoo.getChildren() is called.
It will still miss those servers. When synchronized, If one thread is in getList() then another thread calling post() will block. Detecting changes after getChildren is called is not needed, just need a consistent snapshot at a point in time. It could be achieved by checking getChildren in a loop and waiting for it stabilize. But the data could still be outdated by other operations immediately after getList() returns, so the code still has to treat it as a snapshot. Anything more would require some sort of transaction semantics across method calls, which is not needed.