Uploaded image for project: 'Apache Curator'
  1. Apache Curator
  2. CURATOR-559

Inconsistent ZK timeouts

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 4.2.0, 4.3.0
    • 4.3.0, 5.0.0
    • Framework
    • None

    Description

      I've configured a reasonable timeout using BoundedExponentialBackoffRetry, and generally it works as I'd expect if ZK is down when I make a call like "create.forPath". But if ZK is unavailable when I call acquire on an InterProcessReadWriteLock, it takes far longer before it finally times out.

      I call acquire which is wrapped in "RetryLoop.callWithRetry" and it goes onto call findProtectedNodeInForeground which is also wrapped in "RetryLoop.callWithRetry". If I've configured the BoundedExponentialBackoffRetry to retry 20 times, the inner retry tries 20 times for every one of the 20 outer retry loops, so it retries 400 times.

       

      This class recreates it, if you put break points at the commented sections and bring ZK down you can see the different times until it disconnects and the stack traces which I've included below.

       

      public class GoCurator {
      public static void main(String[] args) throws Exception {
      
          CuratorFramework cf = CuratorFrameworkFactory.newClient(
                  "localhost:2181",
                  new BoundedExponentialBackoffRetry(200, 10000, 20)
          );
          cf.start();
      
          String root = "/myRoot";
          if(cf.checkExists().forPath(root) == null) {
              // Stacktrace A showing what happens if ZK is down for this call
              cf.create().forPath(root);
          }
      
          InterProcessReadWriteLock lcok = new InterProcessReadWriteLock(cf, "/grant/myLock");
      
          // See stacktrace B showing the nested re-try if ZK is down for this call
          lcok.readLock().acquire();
      
          lcok.readLock().release();
      
          System.out.println("done");
      } 

       

      Stacktrace A (if ZK is down when I'm calling create().forPath). This shows the single retry loop so it exist after the correct number of attempts:

       

       java.lang.Thread.State: WAITING
        at java.lang.Object.wait(Object.java:-1)
        at java.lang.Object.wait(Object.java:502)
        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1499)
        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1487)
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2617)
        at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:242)
        at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:231)
        at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
        at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:228)
        at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:219)
        at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:41)
        at com.gebatech.curator.GoCurator.main(GoCurator.java:25) 

      Stacktrace B (if ZK is down when I call InterProcessReadWriteLock#readLock#acquire). This shows the nested re-try loop so it doesn't exit until 20*20 attempts.

       

       java.lang.Thread.State: WAITING
        at sun.misc.Unsafe.park(Unsafe.java:-1)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
        at org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:434)
        at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:56)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
        at org.apache.curator.framework.imps.CreateBuilderImpl.findProtectedNodeInForeground(CreateBuilderImpl.java:1239)
        at org.apache.curator.framework.imps.CreateBuilderImpl.access$1700(CreateBuilderImpl.java:51)
        at org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1167)
        at org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1156)
        at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
        at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1153)
        at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:607)
        at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:597)
        at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:575)
        at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:51)
        at org.apache.curator.framework.recipes.locks.StandardLockInternalsDriver.createsTheLock(StandardLockInternalsDriver.java:54)
        at org.apache.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:225)
        at org.apache.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:237)
        at org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:89)
        at com.gebatech.curator.GoCurator.main(GoCurator.java:29) 

      Attachments

        Activity

          People

            randgalt Jordan Zimmerman
            digbyg Grant Digby
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 10m
                1h 10m