Patch looks good.
+ LOG.warn(zkw.prefix("Unable to set watcher on znode (" + znode + ")"), keeperEx);
... but the method says its checkExists w/o setting watch.
I think this a bad idea; i.e. sleeping w/o interrupt. How long is SOCKET_RETRY_WAIT_MS? What if we try to stop the hosting server in meantime? We have to wait on this to come up out of this loop?
Passing 0, are we supposed to try once only? My guess is that we could try more than once given how the loop runs; i.e. we may loop multiple times in same millisecond.. you might want to exit loop if timeout is zero.
What happens if a client comes in during this time? It will crash out immediately because no base node?