HBase
  1. HBase
  2. HBASE-10785

Metas own location should be cached

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.99.0, hbase-10070
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      With ROOT table gone, we no longer cache the location of the meta table (in MetaCache) in 96+. I've checked 94 code, and there we cache meta, but not root.

      However, not caching the metas own location means that we are doing a zookeeper request every time we want to look up a regions location from meta. This means that there is a significant spike in zk requests whenever a region server goes down.

      This affects trunk,0.98 and 0.96 as well as hbase-10070 branch. I've discovered the issue in hbase-10070 because of the integration test (HBASE-10572) results in 150K requests to zk in 10min.

      A thread dump from one of the runs have 100+ threads from client in this stack trace:

      	"TimeBoundedMultiThreadedReaderThread_20" prio=10 tid=0x00007f852c2f2000 nid=0x57b6 in Object.wait() [0x00007f85059e7000]
      	   java.lang.Thread.State: WAITING (on object monitor)
      		at java.lang.Object.wait(Native Method)
      		at java.lang.Object.wait(Object.java:503)
      		at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
      		- locked <0x00000000ea71aa78> (a org.apache.zookeeper.ClientCnxn$Packet)
      		at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1149)
      		at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337)
      		at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:684)
      		at org.apache.hadoop.hbase.zookeeper.ZKUtil.blockUntilAvailable(ZKUtil.java:1853)
      		at org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.blockUntilAvailable(MetaRegionTracker.java:186)
      		at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:60)
      		at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1126)
      		at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1112)
      		at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1220)
      		at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1129)
      		at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:321)
      		at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.call(RpcRetryingCallerWithReadReplicas.java:257)
      		- locked <0x00000000e9bcf238> (a org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas)
      		at org.apache.hadoop.hbase.client.HTable.get(HTable.java:818)
      		at org.apache.hadoop.hbase.util.MultiThreadedReader$HBaseReaderThread.queryKey(MultiThreadedReader.java:288)
      		at org.apache.hadoop.hbase.util.MultiThreadedReader$HBaseReaderThread.readKey(MultiThreadedReader.java:249)
      		at org.apache.hadoop.hbase.util.MultiThreadedReader$HBaseReaderThread.runReader(MultiThreadedReader.java:192)
      		at org.apache.hadoop.hbase.util.MultiThreadedReader$HBaseReaderThread.run(MultiThreadedReader.java:150)
      	
      1. hbase-10785_v1.patch
        3 kB
        Enis Soztutar
      2. hbase-10785_v2.patch
        4 kB
        Enis Soztutar
      3. hbase-10785_v3.patch
        3 kB
        Enis Soztutar
      4. 0034-HBASE-10785-Metas-own-location-should-be-cached.patch
        3 kB
        Enis Soztutar

        Issue Links

          Activity

          Enis Soztutar created issue -
          Enis Soztutar made changes -
          Field Original Value New Value
          Link This issue is related to HBASE-10070 [ HBASE-10070 ]
          Enis Soztutar made changes -
          Link This issue is related to HBASE-10701 [ HBASE-10701 ]
          Hide
          Enis Soztutar added a comment -

          Here is a patch which applies to hbase-10070 branch. I can do a trunk patch as well, if we agree on the approach.

          Show
          Enis Soztutar added a comment - Here is a patch which applies to hbase-10070 branch. I can do a trunk patch as well, if we agree on the approach.
          Enis Soztutar made changes -
          Attachment hbase-10785_v1.patch [ 12635431 ]
          Hide
          Enis Soztutar added a comment -

          Here is a v2 patch, which adds a safeguard for the meta location cache.

          Show
          Enis Soztutar added a comment - Here is a v2 patch, which adds a safeguard for the meta location cache.
          Enis Soztutar made changes -
          Attachment hbase-10785_v2.patch [ 12636221 ]
          Hide
          Devaraj Das added a comment -

          Looks good. Some nits:
          1. We don't need the config (once we have tested this on real clusters)
          2. The tableName argument in locateRegion is always going to be META. Might as well not pass it and just declare it within the method.

          Show
          Devaraj Das added a comment - Looks good. Some nits: 1. We don't need the config (once we have tested this on real clusters) 2. The tableName argument in locateRegion is always going to be META. Might as well not pass it and just declare it within the method.
          Hide
          stack added a comment -

          This should be final?

          + private boolean cacheMetaLocationEnabled; // whether to cache meta's own location

          Can fix on commit.

          Is there code duplication in locateMeta? If so, does there have to be (no biggie.. just asking).

          Else, nice find.

          Show
          stack added a comment - This should be final? + private boolean cacheMetaLocationEnabled; // whether to cache meta's own location Can fix on commit. Is there code duplication in locateMeta? If so, does there have to be (no biggie.. just asking). Else, nice find.
          Hide
          Nicolas Liochon added a comment -

          Nice catch

          +        // If we are not supposed to be using the cache, delete any existing cached location
          +        // so it won't interfere.
          +        metaCache.clearCache(tableName, metaCacheKey, replicaId);
          +      }
          

          I know it's a copy paste; but I don't think we should do that: often the second try is w/o cache to be sure, but trashing the cache for the others is bad, as the default for a second try is nearly always to bypass the cache, whatever the reason.

          I think this patch should work, as in the recent past we had a cache for meta.

          Show
          Nicolas Liochon added a comment - Nice catch + // If we are not supposed to be using the cache, delete any existing cached location + // so it won't interfere. + metaCache.clearCache(tableName, metaCacheKey, replicaId); + } I know it's a copy paste; but I don't think we should do that: often the second try is w/o cache to be sure, but trashing the cache for the others is bad, as the default for a second try is nearly always to bypass the cache, whatever the reason. I think this patch should work, as in the recent past we had a cache for meta.
          Hide
          Nicolas Liochon added a comment -

          Hi Enis Soztutar, do you plan to commit this one? It seems important. Note that in HBASE-10018, I removed the "delete any existing cached location so it won't interfere."

          Show
          Nicolas Liochon added a comment - Hi Enis Soztutar , do you plan to commit this one? It seems important. Note that in HBASE-10018 , I removed the "delete any existing cached location so it won't interfere."
          Hide
          Enis Soztutar added a comment -

          Indeed I was looking at this. I need to address the comments real quick.
          We have been running a lot of tests with this, so maybe Devaraj is right that we can remove the conf parameter.

          Show
          Enis Soztutar added a comment - Indeed I was looking at this. I need to address the comments real quick. We have been running a lot of tests with this, so maybe Devaraj is right that we can remove the conf parameter.
          Hide
          Nicolas Liochon added a comment -

          Yes, I agree as well, we should be better without the parameter.

          Show
          Nicolas Liochon added a comment - Yes, I agree as well, we should be better without the parameter.
          Hide
          Enis Soztutar added a comment -

          Attaching v3 patch which removes the conf param, and cache clear on useCache==false.

          Is there code duplication in locateMeta? If so, does there have to be (no biggie.. just asking).

          There is some between locateMeta and locateRegionInMeta, but adding this logic in locateRegionInMeta would make it even more complex. I think it should be fine.

          I know it's a copy paste; but I don't think we should do that: often the second try is w/o cache to be sure, but trashing the cache for the others is bad, as the default for a second try is nearly always to bypass the cache

          I think we can go either ways. The nice part about removing from cache is that one thread already knows that the location that is cached is no good, so it just removes it so that other threads will wait for this to finish the lookup of the new location. On some cases, this will save unnecessary trips to the bad location (and possible socket timeouts), while on other cases, a retry from a thread will stall the other lookup. v3 patch removes this cache invalidation though.

          Show
          Enis Soztutar added a comment - Attaching v3 patch which removes the conf param, and cache clear on useCache==false. Is there code duplication in locateMeta? If so, does there have to be (no biggie.. just asking). There is some between locateMeta and locateRegionInMeta, but adding this logic in locateRegionInMeta would make it even more complex. I think it should be fine. I know it's a copy paste; but I don't think we should do that: often the second try is w/o cache to be sure, but trashing the cache for the others is bad, as the default for a second try is nearly always to bypass the cache I think we can go either ways. The nice part about removing from cache is that one thread already knows that the location that is cached is no good, so it just removes it so that other threads will wait for this to finish the lookup of the new location. On some cases, this will save unnecessary trips to the bad location (and possible socket timeouts), while on other cases, a retry from a thread will stall the other lookup. v3 patch removes this cache invalidation though.
          Enis Soztutar made changes -
          Attachment hbase-10785_v3.patch [ 12639410 ]
          Enis Soztutar made changes -
          Fix Version/s hbase-10070 [ 12326176 ]
          Hide
          Enis Soztutar added a comment -

          I think I've got +1s for earlier versions. The v3 does not add anything new. I'll commit this tomorrow unless objection.

          Show
          Enis Soztutar added a comment - I think I've got +1s for earlier versions. The v3 does not add anything new. I'll commit this tomorrow unless objection.
          Hide
          Enis Soztutar added a comment -

          Committed this to hbase-10070 branch. We would like to get the branch merged back soon to trunk sooner rather than later, so we will get this in trunk as well.

          Show
          Enis Soztutar added a comment - Committed this to hbase-10070 branch. We would like to get the branch merged back soon to trunk sooner rather than later, so we will get this in trunk as well.
          Enis Soztutar made changes -
          Hadoop Flags Reviewed [ 10343 ]
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Resolved [ 5 ]
          Enis Soztutar made changes -
          Hide
          Enis Soztutar added a comment -

          Attaching rebased patch for master that is committed

          Show
          Enis Soztutar added a comment - Attaching rebased patch for master that is committed
          Hide
          Enis Soztutar added a comment -

          Committed to master as part of hbase-10070 branch merge

          Show
          Enis Soztutar added a comment - Committed to master as part of hbase-10070 branch merge
          Enis Soztutar made changes -
          Fix Version/s 0.99.0 [ 12325675 ]
          Hide
          Hudson added a comment -

          FAILURE: Integrated in HBase-TRUNK #5245 (See https://builds.apache.org/job/HBase-TRUNK/5245/)
          HBASE-10785 Metas own location should be cached (enis: rev 48ffa4d5e615c78f6db8f6c2beddd93460887642)

          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionManager.java
            HBASE-11332 Fix for metas location cache from HBASE-10785 (enis: rev 14a09e79bdb9b925ce9548b45fc4e26caadaec07)
          • hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
          • hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionManager.java
          Show
          Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5245 (See https://builds.apache.org/job/HBase-TRUNK/5245/ ) HBASE-10785 Metas own location should be cached (enis: rev 48ffa4d5e615c78f6db8f6c2beddd93460887642) hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionManager.java HBASE-11332 Fix for metas location cache from HBASE-10785 (enis: rev 14a09e79bdb9b925ce9548b45fc4e26caadaec07) hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionManager.java
          Hide
          Enis Soztutar added a comment -

          Closing this issue after 0.99.0 release.

          Show
          Enis Soztutar added a comment - Closing this issue after 0.99.0 release.
          Enis Soztutar made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Enis Soztutar
              Reporter:
              Enis Soztutar
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development