Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-10785

Metas own location should be cached

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.99.0, hbase-10070
    • None
    • None
    • Reviewed

    Description

      With ROOT table gone, we no longer cache the location of the meta table (in MetaCache) in 96+. I've checked 94 code, and there we cache meta, but not root.

      However, not caching the metas own location means that we are doing a zookeeper request every time we want to look up a regions location from meta. This means that there is a significant spike in zk requests whenever a region server goes down.

      This affects trunk,0.98 and 0.96 as well as hbase-10070 branch. I've discovered the issue in hbase-10070 because of the integration test (HBASE-10572) results in 150K requests to zk in 10min.

      A thread dump from one of the runs have 100+ threads from client in this stack trace:

      	"TimeBoundedMultiThreadedReaderThread_20" prio=10 tid=0x00007f852c2f2000 nid=0x57b6 in Object.wait() [0x00007f85059e7000]
      	   java.lang.Thread.State: WAITING (on object monitor)
      		at java.lang.Object.wait(Native Method)
      		at java.lang.Object.wait(Object.java:503)
      		at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
      		- locked <0x00000000ea71aa78> (a org.apache.zookeeper.ClientCnxn$Packet)
      		at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1149)
      		at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337)
      		at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:684)
      		at org.apache.hadoop.hbase.zookeeper.ZKUtil.blockUntilAvailable(ZKUtil.java:1853)
      		at org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.blockUntilAvailable(MetaRegionTracker.java:186)
      		at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:60)
      		at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1126)
      		at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1112)
      		at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1220)
      		at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1129)
      		at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:321)
      		at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.call(RpcRetryingCallerWithReadReplicas.java:257)
      		- locked <0x00000000e9bcf238> (a org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas)
      		at org.apache.hadoop.hbase.client.HTable.get(HTable.java:818)
      		at org.apache.hadoop.hbase.util.MultiThreadedReader$HBaseReaderThread.queryKey(MultiThreadedReader.java:288)
      		at org.apache.hadoop.hbase.util.MultiThreadedReader$HBaseReaderThread.readKey(MultiThreadedReader.java:249)
      		at org.apache.hadoop.hbase.util.MultiThreadedReader$HBaseReaderThread.runReader(MultiThreadedReader.java:192)
      		at org.apache.hadoop.hbase.util.MultiThreadedReader$HBaseReaderThread.run(MultiThreadedReader.java:150)
      	

      Attachments

        1. 0034-HBASE-10785-Metas-own-location-should-be-cached.patch
          3 kB
          Enis Soztutar
        2. hbase-10785_v1.patch
          3 kB
          Enis Soztutar
        3. hbase-10785_v2.patch
          4 kB
          Enis Soztutar
        4. hbase-10785_v3.patch
          3 kB
          Enis Soztutar

        Issue Links

          Activity

            People

              enis Enis Soztutar
              enis Enis Soztutar
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: