HBase
  1. HBase
  2. HBASE-3404

Compaction Ordering for Bulk Import Files

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.90.0, 0.90.1, 0.92.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      We got into an issue today where we were using HFileOutputFormat to perform an incremental load on an already-large cluster. Because bulk-loaded files don't have a sequence ID, they are put in the front of the StoreFile list. This resulted in the following StoreFile ordering

      2GB (bulk) => 25GB => 2GB => ...

      So this triggered a 30+GB major compaction for every single region. Optimally, we would like bulk import files to be ordered in the compaction list at the time of insertion so this can be a much smaller compaction and rely on StoreFile age for major compaction trigger.

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2427 (See https://builds.apache.org/job/HBase-TRUNK/2427/)
          HBASE-3690 Option to Exclude Bulk Import Files from Minor Compaction

          Summary:
          We ran an incremental scrape with HFileOutputFormat and
          encountered major compaction storms. This is caused by the bug in
          HBASE-3404. The permanent fix is a little tricky without HBASE-2856. We
          realized that a quicker solution for avoiding these compaction storms is
          to simply exclude bulk import files from minor compactions and let them
          only be handled by time-based major compactions. Add with functionality
          along with a config option to enable it.

          Rewrote this feature to be done on a per-bulkload basis.

          Test Plan:

          • mvn test -Dtest=TestHFileOutputFormat

          DiffCamp Revision:

          Reviewers: stack, Kannan, JIRA, dhruba

          Reviewed By: stack

          CC: dhruba, lhofhansl, nspiegelberg, stack

          Differential Revision: 357

          nspiegelberg :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2427 (See https://builds.apache.org/job/HBase-TRUNK/2427/ ) HBASE-3690 Option to Exclude Bulk Import Files from Minor Compaction Summary: We ran an incremental scrape with HFileOutputFormat and encountered major compaction storms. This is caused by the bug in HBASE-3404 . The permanent fix is a little tricky without HBASE-2856 . We realized that a quicker solution for avoiding these compaction storms is to simply exclude bulk import files from minor compactions and let them only be handled by time-based major compactions. Add with functionality along with a config option to enable it. Rewrote this feature to be done on a per-bulkload basis. Test Plan: mvn test -Dtest=TestHFileOutputFormat DiffCamp Revision: Reviewers: stack, Kannan, JIRA, dhruba Reviewed By: stack CC: dhruba, lhofhansl, nspiegelberg, stack Differential Revision: 357 nspiegelberg : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java
          Hide
          Nicolas Spiegelberg added a comment -

          need to work with Amit to get bulk files put into the proper location by obtaining a sequence ID for the entire HFile that is being bulk imported. I think it's sufficient to use the sequence ID at the time of file creation instead of at time of online insertion. The only problem would be a second major compaction if one occurred during the HFile creation.

          Show
          Nicolas Spiegelberg added a comment - need to work with Amit to get bulk files put into the proper location by obtaining a sequence ID for the entire HFile that is being bulk imported. I think it's sufficient to use the sequence ID at the time of file creation instead of at time of online insertion. The only problem would be a second major compaction if one occurred during the HFile creation.
          Hide
          Nicolas Spiegelberg added a comment -

          The perils of choosing a sequence ID have been discussed in HBASE-1923. It seems like the biggest pitfall is making sure that the Bulk Upload seq id < lowest un-flushed sequence ID. The discussion mentions that duplicate seqids would be perilous, but I'm still trying to understand why this is the case. The main problems seemed to be:

          1. StoreFiles was structured as Map<seqid, StoreFile>, didn't want to MultiMap. This is no longer the case.
          2. If 2 StoreFiles have duplicate ROW+COL+TS, we discard the one with the lowest StoreFile seqid on compaction

          #2 is by far the hardest thing to worry about. A couple options I have been thinking about:

          1. Query the region for the lowest un-flushed sequence ID on HFile create. Don't worry about duplicate seq ids.
          2. Like daughter files on split, create a special meta file on HRegionServer::bulkLoadHFile() that contains the seq id

          #1 is easier to implement, but has a couple edge case bugs. I don't think you have to worry about duplicate seq ids, is there something obscure here? More importantly. If a compaction happens between HFOF HFile create and HRegionServer::bulkLoadHFile(), then you don't deterministically know if the bulk load file will have the oldest seqid when it is inserted. #2 is a pretty solid solution, albeit a little hacky. It needs more code modification and could also be prone to odd race conditions (RS dies between writing the bulk seq id file & moving the actual bulk file). Thoughts?

          Show
          Nicolas Spiegelberg added a comment - The perils of choosing a sequence ID have been discussed in HBASE-1923 . It seems like the biggest pitfall is making sure that the Bulk Upload seq id < lowest un-flushed sequence ID. The discussion mentions that duplicate seqids would be perilous, but I'm still trying to understand why this is the case. The main problems seemed to be: 1. StoreFiles was structured as Map<seqid, StoreFile>, didn't want to MultiMap. This is no longer the case. 2. If 2 StoreFiles have duplicate ROW+COL+TS, we discard the one with the lowest StoreFile seqid on compaction #2 is by far the hardest thing to worry about. A couple options I have been thinking about: 1. Query the region for the lowest un-flushed sequence ID on HFile create. Don't worry about duplicate seq ids. 2. Like daughter files on split, create a special meta file on HRegionServer::bulkLoadHFile() that contains the seq id #1 is easier to implement, but has a couple edge case bugs. I don't think you have to worry about duplicate seq ids, is there something obscure here? More importantly. If a compaction happens between HFOF HFile create and HRegionServer::bulkLoadHFile(), then you don't deterministically know if the bulk load file will have the oldest seqid when it is inserted. #2 is a pretty solid solution, albeit a little hacky. It needs more code modification and could also be prone to odd race conditions (RS dies between writing the bulk seq id file & moving the actual bulk file). Thoughts?
          Hide
          Todd Lipcon added a comment -

          How can we order it in the compaction list at the time of insertion? That would require editing the HFile in place to set its seqid, no?

          Show
          Todd Lipcon added a comment - How can we order it in the compaction list at the time of insertion? That would require editing the HFile in place to set its seqid, no?
          Hide
          Nicolas Spiegelberg added a comment -

          @Todd : You're thinking of the original idea for max compaction size pruning in HBASE-3209, where my first diff attempted to sort the files, then compact. The problem is that we use StoreFile seqid during compaction to determine which value to use when we encounter a duplicate ROW+COL+TS. The problem is this:

          1. we could have 3 StoreFiles with seqid [1,2,3]
          2. there is a duplicate in 1 & 2
          3. we sort by size and 2 is small, so we compact [1,3]
          4. the resulting StoreFile has seqid 3, so it would beat seqid 2 in compaction

          This fix should be addressed in HBASE-2856, which associates each KV with a seqid. However, I was hoping to get a quicker fix in for this case

          Show
          Nicolas Spiegelberg added a comment - @Todd : You're thinking of the original idea for max compaction size pruning in HBASE-3209 , where my first diff attempted to sort the files, then compact. The problem is that we use StoreFile seqid during compaction to determine which value to use when we encounter a duplicate ROW+COL+TS. The problem is this: 1. we could have 3 StoreFiles with seqid [1,2,3] 2. there is a duplicate in 1 & 2 3. we sort by size and 2 is small, so we compact [1,3] 4. the resulting StoreFile has seqid 3, so it would beat seqid 2 in compaction This fix should be addressed in HBASE-2856 , which associates each KV with a seqid. However, I was hoping to get a quicker fix in for this case
          Hide
          Todd Lipcon added a comment -

          I thought we had decided at some point that this ordering didn't matter anymore? If so, could we sort the storefiles by size before deciding which files to compact?

          Show
          Todd Lipcon added a comment - I thought we had decided at some point that this ordering didn't matter anymore? If so, could we sort the storefiles by size before deciding which files to compact?

            People

            • Assignee:
              Nicolas Spiegelberg
              Reporter:
              Nicolas Spiegelberg
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development