Uploaded image for project: 'Apache Curator'
  1. Apache Curator
  2. CURATOR-466

LeaderSelector gets in an inconsistent state when releasing resources.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 4.0.1
    • None
    • Recipes
    • None

    Description

      I'm using the leader election recipe that works well until I encountered application shutdown.

      here is my example:

       

      CuratorFramework framework = CuratorFrameworkFactory.builder()
          .connectString("localhost:2181")
          .retryPolicy(new RetryOneTime(100))
          .build();
      
      LeaderSelector leaderSelector = new LeaderSelector(
          framework,
          "/path",
          new LeaderSelectorListener() {
              volatile boolean stopped;
              @Override
              public void takeLeadership(CuratorFramework client) throws Exception {
                  System.out.println("I'm a new leader!");
                  try {
                      while (!Thread.currentThread().isInterrupted() && !stopped) {
                          TimeUnit.SECONDS.sleep(1);
                      }
                  } finally {
                      System.out.println("I'm not a leader anymore..");
                  }
              }
      
              @Override
              public void stateChanged(CuratorFramework client, ConnectionState     newState) {
                  if (client.getConnectionStateErrorPolicy().isErrorState(newState)) {
                      stopped = true;
                  }
               }
        }
      );
      
      framework.start();
      leaderSelector.start();
      
      TimeUnit.SECONDS.sleep(5);
      
      leaderSelector.close();   //(1)
      framework.close();        //(2)

       

      When I release resources by calling close method first on the LeaderSelector instance and then on the CurtorFramework instance (lines 1 and 2) I always get the following exception:

       

      java.lang.IllegalStateException: instance must be started before calling this method
      at org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:444) ~[curator-client-4.0.1.jar:?]
      at org.apache.curator.framework.imps.CuratorFrameworkImpl.delete(CuratorFrameworkImpl.java:424) ~[curator-framework-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:347) ~[curator-recipes-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:124) ~[curator-recipes-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154) ~[curator-recipes-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:449) [curator-recipes-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:466) [curator-recipes-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:65) [curator-recipes-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:246) [curator-recipes-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:240) [curator-recipes-4.0.1.jar:4.0.1]
      at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_141]
      at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]
      at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
      

       

      The reason for the exception is that the non-blocking LeaderSelector.close method delegates call to the internal executor service, which abruptly cancels the running futures with the interptIfRunning flag set to true. Right after this, the CuratorFramework close method is called. By the meantime, the future being canceled executes the finally block where it calls methods on the already closed CuratorFramework instance which leads to throwing an exception.

      I thought I can wait a bit until the LeaderSelector instance is closed, so I tried to delay for some time before closing the CuratorFramework instance, but doing so leads to another exception:

      ava.lang.InterruptedException: null
      at java.lang.Object.wait(Native Method) ~[?:1.8.0_141]
      at java.lang.Object.wait(Object.java:502) ~[?:1.8.0_141]
      at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1409) ~[zookeeper-3.4.12.jar:3.4.12--1]
      at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:874) ~[zookeeper-3.4.12.jar:3.4.12--1]
      at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:274) ~[curator-framework-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:268) ~[curator-framework-4.0.1.jar:4.0.1]
      at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64) ~[curator-client-4.0.1.jar:?]
      at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100) ~[curator-client-4.0.1.jar:?]
      at org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:265) ~[curator-framework-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:249) ~[curator-framework-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:34) ~[curator-framework-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:347) ~[curator-recipes-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:124) ~[curator-recipes-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154) ~[curator-recipes-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:449) [curator-recipes-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:466) [curator-recipes-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:65) [curator-recipes-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:246) [curator-recipes-4.0.1.jar:4.0.1]
      at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:240) [curator-recipes-4.0.1.jar:4.0.1]
      at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_141]
      at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]
      at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
      

      At this time the exception is caused by the future being canceled with the interptIfRunning flag set to true in the LeaderSelector close method.

      As the LeaderSelector implementation is based on the InterPorcessMutex that works with ephemeral nodes, do we really need to manually clean up on shutdown? As far as I know, the ephemeral nodes are deleted when the client disconnects.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            m.pryahin Mikhail Pryakhin
            Votes:
            3 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: