[HBASE-23590] Update maxStoreFileRefCount to maxCompactedStoreFileRefCount - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0-alpha-1, 2.3.0, 1.6.0
Fix Version/s: 3.0.0-alpha-1, 2.3.0, 1.6.0
Component/s: None
Labels:
None

Release Note:

Hide
RegionsRecoveryChore introduced as part of ~~HBASE-22460~~ tries to reopen regions based on config: hbase.regions.recovery.store.file.ref.count.
Region reopen needs to take into consideration all compacted away store files that belong to the region and not store files(non-compacted).

Fixed this bug as part of this Jira.
Updated description for corresponding configs:

1. hbase.master.regions.recovery.check.interval :

Regions Recovery Chore interval in milliseconds. This chore keeps running at this interval to find all regions with configurable max store file ref count and reopens them. Defaults to 20 mins

2. hbase.regions.recovery.store.file.ref.count :

Very large number of ref count on a compacted store file indicates that it is a ref leak on that object(compacted store file). Such files can not be removed after it is invalidated via compaction. Only way to recover in such scenario is to reopen the region which can release all resources, like the refcount, leases, etc. This config represents Store files Ref Count threshold value considered for reopening regions. Any region with compacted store files ref count > this value would be eligible for reopening by master. Here, we get the max refCount among all refCounts on all compacted away store files that belong to a particular region. Default value -1 indicates this feature is turned off. Only positive integer value should be provided to enable this feature.

Show
RegionsRecoveryChore introduced as part of HBASE-22460 tries to reopen regions based on config: hbase.regions.recovery.store.file.ref.count. Region reopen needs to take into consideration all compacted away store files that belong to the region and not store files(non-compacted). Fixed this bug as part of this Jira. Updated description for corresponding configs: 1. hbase.master.regions.recovery.check.interval : Regions Recovery Chore interval in milliseconds. This chore keeps running at this interval to find all regions with configurable max store file ref count and reopens them. Defaults to 20 mins 2. hbase.regions.recovery.store.file.ref.count : Very large number of ref count on a compacted store file indicates that it is a ref leak on that object(compacted store file). Such files can not be removed after it is invalidated via compaction. Only way to recover in such scenario is to reopen the region which can release all resources, like the refcount, leases, etc. This config represents Store files Ref Count threshold value considered for reopening regions. Any region with compacted store files ref count > this value would be eligible for reopening by master. Here, we get the max refCount among all refCounts on all compacted away store files that belong to a particular region. Default value -1 indicates this feature is turned off. Only positive integer value should be provided to enable this feature.

Description

As per discussion on ~~HBASE-23349~~, RegionsRecoveryChore should use max refCount on compacted away store files and not on new store files to determine when to reopen the region. Although work on ~~HBASE-23349~~ is in progress, we need to at least update the metric to get the desired refCount i.e. max refCount among all compacted away store files for a given region.

Attachments

Issue Links

links to

GitHub Pull Request #950

Activity

People

Assignee:: Viraj Jasani

Reporter:: Viraj Jasani

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 18/Dec/19 13:29

Updated:: 09/Jan/20 20:23

Resolved:: 01/Jan/20 18:34