Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-22457 Harden the HBase HFile reader reference counting
  3. HBASE-22460

Reopen a region if store reader references may have leaked

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha-1, 1.5.0, 2.3.0
    • Fix Version/s: 3.0.0-alpha-1, 2.3.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Hide
      Leaked store files can not be removed even after it is invalidated via compaction. A reasonable mitigation for a reader reference leak would be a fast reopen of the region on the same server.

      Configs:

      1. hbase.master.regions.recovery.check.interval :

      Regions Recovery Chore interval in milliseconds. This chore keeps running at this interval to find all regions with configurable max store file ref count and reopens them. Defaults to 20 mins

      2. hbase.regions.recovery.store.file.ref.count :

      This config represents Store files Ref Count threshold value considered for reopening regions. Any region with store files ref count > this value would be eligible for reopening by master. Default value -1 indicates this feature is turned off. Only positive integer value should be provided to enable this feature.
      Show
      Leaked store files can not be removed even after it is invalidated via compaction. A reasonable mitigation for a reader reference leak would be a fast reopen of the region on the same server. Configs: 1. hbase.master.regions.recovery.check.interval : Regions Recovery Chore interval in milliseconds. This chore keeps running at this interval to find all regions with configurable max store file ref count and reopens them. Defaults to 20 mins 2. hbase.regions.recovery.store.file.ref.count : This config represents Store files Ref Count threshold value considered for reopening regions. Any region with store files ref count > this value would be eligible for reopening by master. Default value -1 indicates this feature is turned off. Only positive integer value should be provided to enable this feature.

      Description

      We can leak store reader references if a coprocessor or core function somehow opens a scanner, or wraps one, and then does not take care to call close on the scanner or the wrapped instance. A reasonable mitigation for a reader reference leak would be a fast reopen of the region on the same server (initiated by the RS) This will release all resources, like the refcount, leases, etc. The clients should gracefully ride over this like any other region transition. This reopen would be like what is done during schema change application and ideally would reuse the relevant code. If the refcount is over some ridiculous threshold this mitigation could be triggered along with a fat WARN in the logs.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                vjasani Viraj Jasani
                Reporter:
                apurtell Andrew Kyle Purtell
              • Votes:
                0 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: