Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-10335

AuthFailedException in zookeeper may block replication forever

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      ReplicationSource will rechoose sinks when encounted exceptions during skipping edits to the current sink. But if the zookeeper client for peer cluster go to AUTH_FAILED state, the ReplicationSource will always get AuthFailedException. The ReplicationSource does not reconnect the peer, because reconnectPeer only handle ConnectionLossException and SessionExpiredException. As a result, the replication will print log:

      2014-01-14,12:07:06,892 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Getting 0 rs from peer cluster # 20
      2014-01-14,12:07:06,892 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Slave cluster looks down: 20 has 0 region servers

      and be blocked forever.

      I think other places may have same problems for not handling AuthFailedException in zookeeper. eg: HBASE-8675.
      apurtell

      Attachments

        1. HBASE-10335-v2.diff
          2 kB
          Shaohui Liu
        2. HBASE-10335-v1.diff
          2 kB
          Shaohui Liu

        Activity

          People

            liushaohui Shaohui Liu
            liushaohui Shaohui Liu
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: