Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-25627

HBase replication should have a metric to represent if the source is stuck getting initialized

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.5, 2.4.3
    • Fix Version/s: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.4.3
    • Component/s: Replication
    • Labels:
      None

      Description

      There can be situation when the cluster is not able to talk to peer cluster ZK, in that case, yes the logQueue will be accumulating but without digging into the logs, we cannot know what's the reason of loqQueue getting accumulating on the source. 

      Since the replication source doesn't even start the shipper in this case, it is good to have a dedicated metric if the RS cannot talk to the peer's ZK at all. 

       

      2021-03-03 04:02:10,704 DEBUG [peerId] zookeeper.RecoverableZooKeeper - Possibly transient ZooKeeper, quorum=zookeeper-0.zookeeper-a.fakeAddress:2181,zookeeper-1.zookeeper-a.fakeAddress:2181,zookeeper-2.zookeeper-a.fakeAddress:2181,zookeeper-3.zookeeper-a.fakeAddress:2181,zookeeper-4.zookeeper-a.fakeAddress:2181, exception=org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase/hbaseid2021-03-03 04:02:10,704 DEBUG [peerId] zookeeper.RecoverableZooKeeper - Possibly transient ZooKeeper, quorum=zookeeper-0.zookeeper-a.fakeAddress:2181,zookeeper-1.zookeeper-a.fakeAddress:2181,zookeeper-2.zookeeper-a.fakeAddress:2181,zookeeper-3.zookeeper-a.fakeAddress:2181,zookeeper-4.zookeeper-a.fakeAddress:2181, exception=org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase/hbaseidorg.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase/hbaseid at org.apache.zookeeper.KeeperException.create(KeeperException.java:126) at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1119) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:284) at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:469) at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65) at org.apache.hadoop.hbase.zookeeper.ZKClusterId.getUUIDForCluster(ZKClusterId.java:96) at org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.getPeerUUID(HBaseReplicationEndpoint.java:104) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:306)
      

        Attachments

          Activity

            People

            • Assignee:
              sandeep.pal Sandeep Pal
              Reporter:
              sandeep.pal Sandeep Pal
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: