HBase
  1. HBase
  2. HBASE-4857

Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.92.0, 0.94.0
    • Fix Version/s: 0.92.0
    • Component/s: security
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Looking through stack traces for TestMasterFailover, I see a case where the leader AuthenticationTokenSecretManager can get into a recursive loop when a KeeperException is encountered:

      Thread-1-EventThread" daemon prio=10 tid=0x00007f9fb47b2800 nid=0x77f6 waiting on condition [0x00007f9fab376000]
         java.lang.Thread.State: TIMED_WAITING (sleeping)
              at java.lang.Thread.sleep(Native Method)
              at java.lang.Thread.sleep(Thread.java:302)
              at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
              at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55)
              at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206)
              at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891)
              at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161)
              at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:154)
              at org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397)
              at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435)
              at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374)
              at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450)
              at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166)
              at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
              at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
              at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
              at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
              at org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
              at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96)
              at org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
              at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
              at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
              at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
      

      The KeeperException causes ZKLeaderManager to call AuthenticationTokenSecretManager$LeaderElector.stop(), which calls ZKLeaderManager.stepDownAsLeader(), which will encounter another KeeperException, and so on...

      1. HBASE-4857.patch
        0.7 kB
        Gary Helmling

        Activity

        Gary Helmling created issue -
        Gary Helmling made changes -
        Field Original Value New Value
        Attachment HBASE-4857.patch [ 12504898 ]
        Ted Yu made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Ted Yu made changes -
        Assignee Gary Helmling [ ghelmling ]
        Andrew Purtell made changes -
        Priority Major [ 3 ] Critical [ 2 ]
        Gary Helmling made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Resolution Fixed [ 1 ]
        Lars Francke made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Gary Helmling
            Reporter:
            Gary Helmling
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development