Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28596

Optimise BucketCache usage upon regions splits/merges.

    XMLWordPrintableJSON

Details

    • Hide
      This adds a new configuration property, "hbase.rs.evictblocksonsplit”, with default value set to true, which makes all parent region blocks to get evicted on split.

      It has modified behaviour implemented on previous HBASE-27474, to allow prefetch to run on the daughters' refs (if hbase.rs.prefetchblocksonopen is true).

      It has also modified how BucketCache deals with blocks from reference files:
      1) When adding blocks for a reference file, it first resolves the reference and check if the related block from the parent file is already in the cache. If so, it doesn't add any this block to the cache. Otherwise, it will add the block with the reference as the cache key.
      2) When searching for blocks from a reference file in the cache, it first resolves the reference and check for the block from the original file, returning this one if found. Otherwise, it searches the cache again, now using the reference file as cache key.

      Show
      This adds a new configuration property, "hbase.rs.evictblocksonsplit”, with default value set to true, which makes all parent region blocks to get evicted on split. It has modified behaviour implemented on previous HBASE-27474 , to allow prefetch to run on the daughters' refs (if hbase.rs.prefetchblocksonopen is true). It has also modified how BucketCache deals with blocks from reference files: 1) When adding blocks for a reference file, it first resolves the reference and check if the related block from the parent file is already in the cache. If so, it doesn't add any this block to the cache. Otherwise, it will add the block with the reference as the cache key. 2) When searching for blocks from a reference file in the cache, it first resolves the reference and check for the block from the original file, returning this one if found. Otherwise, it searches the cache again, now using the reference file as cache key.

    Description

      This proposal aims to give more flexibility for users to decide whether or not blocks from a parent region should be evict, and also optimise cache usage by resolving file reference blocks to the referred block in the cache.

      Some extra context:

      1) Originally, the default behaviour on splits was to rely on the "hbase.rs.evictblocksonclose" value to decide if the cached blocks from the parent split should be evicted or not. Then the resulting split daughters get open with refs to the parent file. If hbase.rs.prefetchblocksonopen is set, these openings will trigger a prefetch of the blocks from the parent split, now with cache keys from the ref path. That means, if "hbase.rs.evictblocksonclose" is false and “hbase.rs.prefetchblocksonopen” is true, we will be duplicating blocks in the cache. In scenarios where cache usage is at capacity and added latency for reading from the file system is high (for example reading from a cloud storage), this can have a severe impact, as the prefetch for the refs would trigger evictions. Also, the refs tend to be short lived, as compaction is triggered on the split daughters soon after it’s open.

      2) HBASE-27474 has changed the original behaviour described above, to now always evict blocks from the split parent upon split is completed, and skipping prefetch for refs (since refs are short lived). The side effect is that the daughters blocks would only be cached once compaction is completed, but compaction itself will run slower since it needs to read the blocks from the file system. On regions as large as 20GB, the performance degradation reported by users has been severe.

      This proposes a new “hbase.rs.evictblocksonsplit” configuration property that makes the eviction over split configurable. Depending on the use case, the impact of mass evictions due to cache capacity may be higher, in which case users might prefer to keep evicting split parent blocks. Additionally, it modifies the way we handle refs when caching. HBASE-27474 behaviour was to skip caching refs to avoid duplicate data in the cache as long as compaction was enabled, relying on the fact that refs from splits are usually short lived. Here, we propose modifying the search for blocks cache keys, so that we always resolve the referenced file first and look for the related referenced file block in the cache. That way we avoid duplicates in the cache and also expedite scan performance on the split daughters, as it’s now resolving the referenced file and reading from the cache.

      Attachments

        Issue Links

          Activity

            People

              wchevreuil Wellington Chevreuil
              wchevreuil Wellington Chevreuil
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: