Uploaded image for project: 'Apache Curator'
  1. Apache Curator
  2. CURATOR-573

No leader is getting selected intermittently

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Not A Problem
    • 4.0.1
    • None
    • Apache, Framework, Recipes
    • None

    Description

      I am using Apache Curator Leader Election Recipe : https://curator.apache.org/curator-recipes/leader-election.html in my application.

      Zookeeper version : 3.5.7
      Curator : 4.0.1

      Below are the sequence of steps:
      1. Whenever my tomcat server instance is getting up, I create a single CuratorFramework instance(single instance per tomcat server) and start it :

      StartUp Code
      CuratorFramework client = CuratorFrameworkFactory.newClient(connectionString, retryPolicy);
      client.start();
      if(!client.blockUntilConnected(10, TimeUnit.MINUTES)){
       LOGGER.error("Zookeeper connection could not establish!");
       throw new RuntimeException("Zookeeper connection could not establish");
      }
      

      2. Create an instance of LSAdapter and start it:

      LSAdapter initializing
      LSAdapter adapter = new LSAdapter(client, <some_metadata>);
      adapter.start();
      

      Below is my LSAdapter class :

      LSAdapter.java
      public class LSAdapter extends LeaderSelectorListenerAdapter implements Closeable {
      
      //<Class instance variables defined>
       public LSAdapter(CuratorFramework client, <some_metadata>) {
       leaderSelector = new LeaderSelector(client, <path_to_be_used_for_leader_election>, this);
       leaderSelector.autoRequeue();
       }
      
      public void start() throws IOException {
       leaderSelector.start();
       }
      
      @Override
       public void close() throws IOException {
       leaderSelector.close();
       }
      
      @Override
       public void takeLeadership(CuratorFramework client) throws Exception {
       final int waitSeconds = (int) (5 * Math.random()) + 1;
      
      LOGGER.info(name + " is now the leader. Waiting " + waitSeconds + " seconds...");
       LOGGER.debug(name + " has been leader " + leaderCount.getAndIncrement() + " time(s) before.");
       while (true) {
       try {
       Thread.sleep(TimeUnit.SECONDS.toMillis(waitSeconds));
       //do leader tasks
       } catch (InterruptedException e) {
       LOGGER.error(name + " was interrupted.");
       //cleanup
      
      /*Here, code is creating a znode. If client 's current state is CLOSED, this line will throw exception resulting in takeLeadership() exit. Else if, client state is STARTED, znode should be created. In case when LSAdaptor.close() is called, the client state will always be CLOSED at this line, and an exception is expected to be thrown.*/
      
      //This line will always throw exception when client state is "CLOSED" and because of which takeLeadership will exit
      ZookeeperUtil.createEphemeral(client, <some_path>);
      
      Thread.currentThread().interrupt();
       } finally {
      
      }
       }
       }
      }
      

      4. When server instance is getting down, close LSAdapter instance(which application is using) and close CuratorFramework client created

      PreDestroy code
      CloseableUtils.closeQuietly(lsAdapter);
      curatorFrameworkClient.close();
      

      The issue I am facing is that at times, when server is restarted, no leader gets elected. I checked that by tracing the log inside takeLeadership(). I have two tomcat server instances with above code, connecting to same zookeeper quorum and most of the times one of the instance becomes leader but when this issue happens, both of them becomes follower. Please suggest what am I doing wrong.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            Viniti Viniti
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: