Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-25624

Bound LoadBalancer's RegionLocationFinder cache

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 1.6.0, 2.4.1
    • None
    • Balancer, master, Operability
    • None

    Description

      We have a large table in production that causes the balancer's RegionLocationFinder cache to consume 4 GB of heap, which, among other factors, triggered OOMEs, and made us aware of this problem.

      RegionLocationFinder embeds a cache backed by Guava's CacheLoader. The RegionLocationFinder cache comes to consume heap for RegionInfos for all table regions and all HDFS block locations of all store files for all regions of all tables.

      The only limit we pass to the CacheBuilder is an expiration time of 14400000 milliseconds for individual cache entries. That's 4 hours. That's much too long; however, the cache also periodically refreshes itself, where the need for a refresh is checked whenever BaseLoadBalancer calls RegionLocationFinder's setClusterMetrics() method, which defeats the expiration based limit anyway.

      We should be bounding this cache with effective resource controls. Time based expiry is fine but the periodic refresh logic must be removed to make it effective. Implement size based limits too. CacheBuilder#maximumSize will limit by number cache entries. This might be fine but CacheBuilder#maximumWeight would be better, where weight is something determined by the API user. In this case it can be an estimate of the heap size of the hash map entries kept in the cache.

      Default should remain unbounded. Specific bounds should be supported by new site configuration options.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            apurtell Andrew Kyle Purtell
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment