Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-23222

Better logging and mitigation for MOB compaction failures

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.1.0, 2.0.0, 2.2.0
    • Fix Version/s: 3.0.0-alpha-1, 2.3.0, 2.1.8, 2.2.3
    • Component/s: mob
    • Labels:
      None
    • Release Note:
      Hide
      <!-- markdown -->

      The MOB compaction process in the HBase Master now logs more about its activity.

      In the event that you run into the problems described in HBASE-22075, there is a new HFileCleanerDelegate that will stop all removal of MOB hfiles from the archive area. It can be configured by adding `org.apache.hadoop.hbase.mob.ManualMobMaintHFileCleaner` to the list configured for `hbase.master.hfilecleaner.plugins`. This new cleaner delegate will cause your archive area to grow unbounded; you will have to manually prune files which may be prohibitively complex. Consider if your use case will allow you to mitigate by disabling mob compactions instead.

      Caveats:
      * Be sure the list of cleaner delegates still includes the default cleaners you will likely need: ttl, snapshot, and hlink.
      * Be mindful that if you enable this cleaner delegate then there will be *no* automated process for removing these mob hfiles. You should see a single region per table in `%hbase_root%/archive` that accumulates files over time. You will have to determine which of these files are safe or not to remove.
      * You should list this cleaner delegate after the snapshot and hlink delegates so that you can enable sufficient logging to determine when an archived mob hfile is needed by those subsystems. When set to `TRACE` logging, the CleanerChore logger will include archive retention decision justifications.
      * If your use case creates a large number of uniquely named tables, this new delegate will cause memory pressure on the master.
      Show
      <!-- markdown --> The MOB compaction process in the HBase Master now logs more about its activity. In the event that you run into the problems described in HBASE-22075 , there is a new HFileCleanerDelegate that will stop all removal of MOB hfiles from the archive area. It can be configured by adding `org.apache.hadoop.hbase.mob.ManualMobMaintHFileCleaner` to the list configured for `hbase.master.hfilecleaner.plugins`. This new cleaner delegate will cause your archive area to grow unbounded; you will have to manually prune files which may be prohibitively complex. Consider if your use case will allow you to mitigate by disabling mob compactions instead. Caveats: * Be sure the list of cleaner delegates still includes the default cleaners you will likely need: ttl, snapshot, and hlink. * Be mindful that if you enable this cleaner delegate then there will be *no* automated process for removing these mob hfiles. You should see a single region per table in `%hbase_root%/archive` that accumulates files over time. You will have to determine which of these files are safe or not to remove. * You should list this cleaner delegate after the snapshot and hlink delegates so that you can enable sufficient logging to determine when an archived mob hfile is needed by those subsystems. When set to `TRACE` logging, the CleanerChore logger will include archive retention decision justifications. * If your use case creates a large number of uniquely named tables, this new delegate will cause memory pressure on the master.

      Description

      Some logging and mitigation options for MOB dataloss issues described in HBASE-22075.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                busbey Sean Busbey
                Reporter:
                busbey Sean Busbey
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: