Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-16287

LruBlockCache size should not exceed acceptableSize too many

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.3.0, 1.2.3, 2.0.0
    • BlockCache
    • None
    • Reviewed
    • Hide
      In order to avoid blockcache size exceed acceptable size too much, we add one configuration "hbase.lru.blockcache.hard.capacity.limit.factor" to decide whether the block could be put into LruBlockCache or not. This factor defaults to 1.2
      If blockcache size >= factor*acceptableSize, we will reject the block into cache.
      Show
      In order to avoid blockcache size exceed acceptable size too much, we add one configuration "hbase.lru.blockcache.hard.capacity.limit.factor" to decide whether the block could be put into LruBlockCache or not. This factor defaults to 1.2 If blockcache size >= factor*acceptableSize, we will reject the block into cache.

    Description

      Our regionserver has a configuation as bellow:
      -Xmn4g -Xms32g -Xmx32g -XX:SurvriorRatio=2 -XX:+UseConcMarkSweepGC
      also we only use blockcache,and set hfile.block.cache.size = 0.3 in hbase_site.xml,so under this configuration, the lru block cache size will be(32g-1g)*0.3=9.3g. but in some scenarios,some of the rs will occur continuous FullGC for hours and most importantly, after FullGC most of the object in old will not be GCed. so we dump the heap and analyse with MAT and we observed a obvious memory leak in LruBlockCache, which occpy about 16g memory, then we set set class LruBlockCache log level to TRACE and observed this in log:


      2016-07-22 12:17:58,158 INFO [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=15.29 GB, freeSize=-5.99 GB, max=9.30 GB, blockCount=628182, accesses=101799469125, hits=93517800259, hitRatio=91.86%, , cachingAccesses=99462650031, cachingHits=93468334621, cachingHitsRatio=93.97%, evictions=238199, evicted=4776350518, evictedPerRun=20051.93359375

      we can see blockcache size has exceeded acceptableSize too many, which will cause the FullGC more seriously.
      Afterfter some investigations, I found in this function:

        public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean inMemory,
            final boolean cacheDataInL1) {
      

      No matter the blockcache size has been used, just put the block into it. but if the evict thread is not fast enough, blockcache size will increament significantly.
      So here I think we should have a check, for example, if the blockcache size > 1.2 * acceptableSize(), just return and dont put into it until the blockcache size if under watrmark. if this is reasonable, I can make a small patch for this.

      Attachments

        1. HBASE-16287-v1.patch
          2 kB
          Yu Sun
        2. HBASE-16287-v2.patch
          7 kB
          Yu Sun
        3. HBASE-16287-v3.patch
          7 kB
          Yu Sun
        4. HBASE-16287-v4.patch
          7 kB
          Yu Sun
        5. HBASE-16287-v5.patch
          7 kB
          Yu Sun
        6. HBASE-16287-v6.patch
          7 kB
          Yu Sun
        7. HBASE-16287-v7.patch
          7 kB
          Yu Sun
        8. HBASE-16287-v8.patch
          7 kB
          Yu Sun
        9. HBASE-16287-v9.patch
          8 kB
          Yu Sun

        Issue Links

          Activity

            People

              haoran Yu Sun
              haoran Yu Sun
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: