Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.12.0
-
None
-
None
Description
If PathChildrenCache is started when Zookeeper is not available for a quite long time (to exceed operations retries) and parent node did not exist - when the connection to Zookeeper is resumed PathChildrenCache does not watch for changes anymore.
Root cause: PathChildrenCache uses EnsureContainers which has the following logic:
private synchronized void internalEnsure() throws Exception { if ( ensureNeeded.compareAndSet(true, false) ) { client.createContainers(path); } }
This logic is not aware about operation result, even if client.createContainers throws an exception and the nodes are not created EnsureContainers next time will not try to do it.
Example of the exception:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /test at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073) at org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:274) at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:199) at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:193) at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) at org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:190) at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:175) at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:32) at org.apache.curator.framework.imps.CuratorFrameworkImpl.createContainers(CuratorFrameworkImpl.java:194) at org.apache.curator.framework.EnsureContainers.internalEnsure(EnsureContainers.java:61) at org.apache.curator.framework.EnsureContainers.ensure(EnsureContainers.java:53) at org.apache.curator.framework.recipes.cache.PathChildrenCache.ensurePath(PathChildrenCache.java:576) at org.apache.curator.framework.recipes.cache.PathChildrenCache.refresh(PathChildrenCache.java:490) at org.apache.curator.framework.recipes.cache.RefreshOperation.invoke(RefreshOperation.java:35) at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:773) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
As a result the watcher registered in org.apache.curator.framework.recipes.cache.PathChildrenCache#refresh is not triggered.
Test to reproduce:
@Test public void test() throws Exception { TestingServer zkTestServer = new TestingServer(2181, false); CuratorFramework curatorFramework = CuratorFrameworkFactory.newClient( zkTestServer.getConnectString(), 5000, 1000, new RetryOneTime(100) ); curatorFramework.start(); PathChildrenCache cache = new PathChildrenCache(curatorFramework, "/test", true); cache.start(PathChildrenCache.StartMode.POST_INITIALIZED_EVENT); Thread.sleep(5000); zkTestServer.start(); curatorFramework.create().creatingParentContainersIfNeeded().forPath("/test/example"); while(true) { Thread.sleep(1000); System.out.println(cache.getCurrentData()); } }