Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-23590

Update maxStoreFileRefCount to maxCompactedStoreFileRefCount

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha-1, 2.3.0, 1.6.0
    • 3.0.0-alpha-1, 2.3.0, 1.6.0
    • None
    • None
    • Hide
      RegionsRecoveryChore introduced as part of HBASE-22460 tries to reopen regions based on config: hbase.regions.recovery.store.file.ref.count.
      Region reopen needs to take into consideration all compacted away store files that belong to the region and not store files(non-compacted).

      Fixed this bug as part of this Jira.
      Updated description for corresponding configs:

      1. hbase.master.regions.recovery.check.interval :

      Regions Recovery Chore interval in milliseconds. This chore keeps running at this interval to find all regions with configurable max store file ref count and reopens them. Defaults to 20 mins

      2. hbase.regions.recovery.store.file.ref.count :

      Very large number of ref count on a compacted store file indicates that it is a ref leak on that object(compacted store file). Such files can not be removed after it is invalidated via compaction. Only way to recover in such scenario is to reopen the region which can release all resources, like the refcount, leases, etc. This config represents Store files Ref Count threshold value considered for reopening regions. Any region with compacted store files ref count > this value would be eligible for reopening by master. Here, we get the max refCount among all refCounts on all compacted away store files that belong to a particular region. Default value -1 indicates this feature is turned off. Only positive integer value should be provided to enable this feature.
      Show
      RegionsRecoveryChore introduced as part of HBASE-22460 tries to reopen regions based on config: hbase.regions.recovery.store.file.ref.count. Region reopen needs to take into consideration all compacted away store files that belong to the region and not store files(non-compacted). Fixed this bug as part of this Jira. Updated description for corresponding configs: 1. hbase.master.regions.recovery.check.interval : Regions Recovery Chore interval in milliseconds. This chore keeps running at this interval to find all regions with configurable max store file ref count and reopens them. Defaults to 20 mins 2. hbase.regions.recovery.store.file.ref.count : Very large number of ref count on a compacted store file indicates that it is a ref leak on that object(compacted store file). Such files can not be removed after it is invalidated via compaction. Only way to recover in such scenario is to reopen the region which can release all resources, like the refcount, leases, etc. This config represents Store files Ref Count threshold value considered for reopening regions. Any region with compacted store files ref count > this value would be eligible for reopening by master. Here, we get the max refCount among all refCounts on all compacted away store files that belong to a particular region. Default value -1 indicates this feature is turned off. Only positive integer value should be provided to enable this feature.

    Description

      As per discussion on HBASE-23349, RegionsRecoveryChore should use max refCount on compacted away store files and not on new store files to determine when to reopen the region. Although work on HBASE-23349 is in progress, we need to at least update the metric to get the desired refCount i.e. max refCount among all compacted away store files for a given region.

      Attachments

        Issue Links

          Activity

            People

              vjasani Viraj Jasani
              vjasani Viraj Jasani
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: