Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-10310

ZNodeCleaner session expired for /hbase/master

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.96.1.1
    • Fix Version/s: 0.98.0, 0.96.2, 0.99.0
    • Component/s: master
    • Labels:
      None
    • Environment:

      x86_64 GNU/Linux

    • Hadoop Flags:
      Reviewed

      Description

      I was testing "hbase master clear" command while working on HBASE-7386 here is command and exception:

      $ export HBASE_ZNODE_FILE=/tmp/hbase-hadoop-master.znode; ./hbase master clear
      
      14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=zk1:2181 sessionTimeout=90000 watcher=clean znode for master, quorum=zk1:2181, baseZNode=/hbase
      14/01/10 14:05:44 INFO zookeeper.RecoverableZooKeeper: Process identifier=clean znode for master connecting to ZooKeeper ensemble=zk1:2181
      14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Opening socket connection to server zk1/172.17.33.5:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
      14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Socket connection established to zk11/172.17.33.5:2181, initiating session
      14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Session establishment complete on server zk1/172.17.33.5:2181, sessionid = 0x1427a96bfea4a8a, negotiated timeout = 40000
      14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Session: 0x1427a96bfea4a8a closed
      14/01/10 14:05:44 INFO zookeeper.ClientCnxn: EventThread shut down
      14/01/10 14:05:44 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
      14/01/10 14:05:44 INFO util.RetryCounter: Sleeping 1000ms before retry #0...
      14/01/10 14:05:45 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
      14/01/10 14:05:45 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 1 attempts
      14/01/10 14:05:45 WARN zookeeper.ZKUtil: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Unable to get data of znode /hbase/master
      org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
      	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
      	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337)
      	at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777)
      	at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170)
      	at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160)
      	at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      	at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
      	at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779)
      14/01/10 14:05:45 ERROR zookeeper.ZooKeeperWatcher: clean znode for master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
      org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
      	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
      	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337)
      	at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777)
      	at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170)
      	at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160)
      	at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      	at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
      	at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779)
      14/01/10 14:05:45 WARN zookeeper.ZooKeeperNodeTracker: Can't get or delete the master znode
      org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
      	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
      	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337)
      	at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777)
      	at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170)
      	at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160)
      	at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      	at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
      	at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779)
      

      After checking ZNodeCleaner.java i notice this lines :

       try {
            znodeFileContent = ZNodeClearer.readMyEphemeralNodeOnDisk();
            
          } catch (FileNotFoundException fnfe) {
            // If no file, just keep going -- return success.
            LOG.warn("Can't find the znode file; presume non-fatal", fnfe);
            return true;
          } catch (IOException e) {
            LOG.warn("Can't read the content of the znode file", e);
            return false;
          } finally {
            zkw.close();
          }
      
          return MasterAddressTracker.deleteIfEquals(zkw, znodeFileContent);
        }
      

      Looks like we are closing zookeeper connection prematurely. After moving

       return MasterAddressTracker.deleteIfEquals(zkw, znodeFileContent); 

      inside try block issue was fixed.

        Attachments

        1. HBASE-10310.patch
          2 kB
          Samir Ahmic

          Issue Links

            Activity

              People

              • Assignee:
                asamir Samir Ahmic
                Reporter:
                asamir Samir Ahmic
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: