Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-8675

Two active Hmasters for AUTH_FAILED in secure hbase cluster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Not A Problem
    • None
    • None
    • master
    • None

    Description

      In our product cluster, because of the net problem to kerberos server, the ZooKeeperWatcher in active hmaster fails to Auth , gets a connection Event of AUTH_FAILED and loose the master lock. But the zookeeper watcher ignores the event, so the old active hmaster keeps to be active. After the net problem is fixed, the backup hmaster gets the master lock and becomes active. There are two two active hmasters in the cluster.

      2013-05-30 09:44:21,004 ERROR org.apache.zookeeper.client.ZooKeeperSaslClient: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: krb1.xiaomi.net)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.

      2013-05-30 09:54:07,755 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection-0x3e10d98be405bc Unable to set watcher on znode /hbase/master
      org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase/master
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
      at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
      at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036)
      at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:166)
      at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:231)
      at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:76)
      at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.ensureZookeeperTrackers(HConnectionManager.java:595)
      at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:850)
      at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:825)
      at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:286)
      at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:201)
      at org.apache.hadoop.hbase.catalog.MetaReader.getHTable(MetaReader.java:200)
      at org.apache.hadoop.hbase.catalog.MetaReader.getMetaHTable(MetaReader.java:226)
      at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:705)
      at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:183)
      at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:168)
      at org.apache.hadoop.hbase.master.CatalogJanitor.getSplitParents(CatalogJanitor.java:123)
      at org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:134)
      at org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:92)
      at org.apache.hadoop.hbase.Chore.run(Chore.java:67)
      at java.lang.Thread.run(Thread.java:662)

      I want to just abort the hmaster server if AuthFailed or SaslAuthenticated. Any better idea about this issue?
      For ZookeeperWatcher is used in many classes, will the aborting will bring more problems? Any more problems we need consider?

      Attachments

        1. HBASE-8675-0.94-v1.patch
          2 kB
          Shaohui Liu

        Activity

          People

            Unassigned Unassigned
            liushaohui Shaohui Liu
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: