Uploaded image for project: 'Apache Curator'
  1. Apache Curator
  2. CURATOR-422

PathChildrenCache is not tolerant to failed connection to ZK on startup

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.12.0
    • Fix Version/s: None
    • Component/s: Recipes
    • Labels:
      None

      Description

      If PathChildrenCache is started when Zookeeper is not available for a quite long time (to exceed operations retries) and parent node did not exist - when the connection to Zookeeper is resumed PathChildrenCache does not watch for changes anymore.
      Root cause: PathChildrenCache uses EnsureContainers which has the following logic:

      private synchronized void internalEnsure() throws Exception
          {
              if ( ensureNeeded.compareAndSet(true, false) )
              {
                  client.createContainers(path);
              }
          }
      

      This logic is not aware about operation result, even if client.createContainers throws an exception and the nodes are not created EnsureContainers next time will not try to do it.
      Example of the exception:

      org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /test
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
      	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
      	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1073)
      	at org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:274)
      	at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:199)
      	at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:193)
      	at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
      	at org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:190)
      	at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:175)
      	at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:32)
      	at org.apache.curator.framework.imps.CuratorFrameworkImpl.createContainers(CuratorFrameworkImpl.java:194)
      	at org.apache.curator.framework.EnsureContainers.internalEnsure(EnsureContainers.java:61)
      	at org.apache.curator.framework.EnsureContainers.ensure(EnsureContainers.java:53)
      	at org.apache.curator.framework.recipes.cache.PathChildrenCache.ensurePath(PathChildrenCache.java:576)
      	at org.apache.curator.framework.recipes.cache.PathChildrenCache.refresh(PathChildrenCache.java:490)
      	at org.apache.curator.framework.recipes.cache.RefreshOperation.invoke(RefreshOperation.java:35)
      	at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:773)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      

      As a result the watcher registered in org.apache.curator.framework.recipes.cache.PathChildrenCache#refresh is not triggered.

      Test to reproduce:

      @Test
      public void test() throws Exception {
          TestingServer zkTestServer = new TestingServer(2181, false);
      
          CuratorFramework curatorFramework = CuratorFrameworkFactory.newClient(
                  zkTestServer.getConnectString(),
                  5000,
                  1000,
                  new RetryOneTime(100)
          );
          curatorFramework.start();
          PathChildrenCache cache = new PathChildrenCache(curatorFramework, "/test", true);
          cache.start(PathChildrenCache.StartMode.POST_INITIALIZED_EVENT);
      
          Thread.sleep(5000);
      
          zkTestServer.start();
          curatorFramework.create().creatingParentContainersIfNeeded().forPath("/test/example");
      
          while(true) {
              Thread.sleep(1000);
              System.out.println(cache.getCurrentData());
          }
      }
      

        Activity

        There are no comments yet on this issue.

          People

          • Assignee:
            Unassigned
            Reporter:
            dnk Dmitry Konstantinov
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development