HBase
  1. HBase
  2. HBASE-834

'Major' compactions and upper bound on files we compact at any one time

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.2.1, 0.18.0
    • Fix Version/s: 0.2.1, 0.18.0
    • Component/s: None
    • Labels:
      None

      Description

      From Billy in HBASE-64, which we closed because it got pulled all over the place:

      Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3
      
      I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region
      
      If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.
      
      When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.
      
      1. 834.patchv4-trunk.txt
        1.0 kB
        Billy Pearson
      2. 834-0.2.1-patchv4.txt
        0.9 kB
        Billy Pearson
      3. 834-0.2.1-patchv3.txt
        8 kB
        Billy Pearson
      4. 834-0.2.1-patchv2.txt
        5 kB
        Billy Pearson
      5. 834-0.2.1-patch.txt
        5 kB
        Billy Pearson
      6. 834-patch.txt
        5 kB
        Billy Pearson

        Activity

        Hide
        stack added a comment -

        FYI Billy, better to just open new issue rather than reopen an old. Reresolving this one.

        Show
        stack added a comment - FYI Billy, better to just open new issue rather than reopen an old. Reresolving this one.
        Hide
        stack added a comment -

        Ok. Thanks Billy. Applied the patch. Looked harmless; just a check of >= rather than >.

        Show
        stack added a comment - Ok. Thanks Billy. Applied the patch. Looked harmless; just a check of >= rather than >.
        Hide
        Billy Pearson added a comment -

        834.patchv4-trunk.txt
        Has just the changes to make it work correctly on trunk

        Show
        Billy Pearson added a comment - 834.patchv4-trunk.txt Has just the changes to make it work correctly on trunk
        Hide
        Billy Pearson added a comment -

        834-0.2.1-patchv4.txt
        Has just the changes to make it work correctly branch 0.2

        Show
        Billy Pearson added a comment - 834-0.2.1-patchv4.txt Has just the changes to make it work correctly branch 0.2
        Hide
        Billy Pearson added a comment -

        Got a chance to run some test and Found max files to compaction at one time not working correctly.

        Show
        Billy Pearson added a comment - Got a chance to run some test and Found max files to compaction at one time not working correctly.
        Hide
        stack added a comment -

        Oh, I applied this patch to branch and trunk. Fixed comments where it talked about the 'force' paramter instead of the new 'majorCompaction' parameter. The patch failed going against TRUNK but the hunks that didn't go in, we don't want anyway. Just left them out. Thanks for the patch Billy.

        Show
        stack added a comment - Oh, I applied this patch to branch and trunk. Fixed comments where it talked about the 'force' paramter instead of the new 'majorCompaction' parameter. The patch failed going against TRUNK but the hunks that didn't go in, we don't want anyway. Just left them out. Thanks for the patch Billy.
        Hide
        stack added a comment -

        I did a bunch of testing. Was able to load a table, delete, then refill into table of same name and schema three times which is much better than I could do previously. But then on the fourth time when I go to load a table, when I check the .META. table, there are old historian edits showing. I'm hoping this is HBASE-855. Will retest when HBASE-855 has a patch.

        Show
        stack added a comment - I did a bunch of testing. Was able to load a table, delete, then refill into table of same name and schema three times which is much better than I could do previously. But then on the fourth time when I go to load a table, when I check the .META. table, there are old historian edits showing. I'm hoping this is HBASE-855 . Will retest when HBASE-855 has a patch.
        Hide
        stack added a comment -

        Changed the subject.

        Show
        stack added a comment - Changed the subject.
        Hide
        stack added a comment -

        Just what the doctor ordered. I agree w/ your expiration reasoning. Let me do some testing. Will get back to you.

        Show
        stack added a comment - Just what the doctor ordered. I agree w/ your expiration reasoning. Let me do some testing. Will get back to you.
        Hide
        Billy Pearson added a comment -

        If everything looks good let me know and I will make up a patch that will apply to 0.18.0 also.

        Show
        Billy Pearson added a comment - If everything looks good let me know and I will make up a patch that will apply to 0.18.0 also.
        Hide
        Billy Pearson added a comment -

        Might check the javadocs etc not sure what I need to do there I made some changes but please review them for me and let me know if I am missing anything.

        second thought on the ttl on minor compaction a deleted record will be removed in minor compactions if ttl is expired but the record will
        remain until the major compaction or a compaction that includes the cell to be deleted and it will be deleted then also sense the cell its
        self will have a expired ttl we should not get it in a get/scanner. so I thank it is still ok to leave the ttl code to do its work on minor compaction's.

        Show
        Billy Pearson added a comment - Might check the javadocs etc not sure what I need to do there I made some changes but please review them for me and let me know if I am missing anything. second thought on the ttl on minor compaction a deleted record will be removed in minor compactions if ttl is expired but the record will remain until the major compaction or a compaction that includes the cell to be deleted and it will be deleted then also sense the cell its self will have a expired ttl we should not get it in a get/scanner. so I thank it is still ok to leave the ttl code to do its work on minor compaction's.
        Hide
        Billy Pearson added a comment -

        ok latest version of patch
        834-0.2.1-patchv3.txt

        Changed force to majorCompaction better name.

        Passed majorCompaction down to compactHStoreFiles function so we could know not to remove > max version on a minor compaction / incremental compaction
        This should solve the problem with HBASE-826.
        Stack said we do not need to max versions or expiration code on a minor compaction but I left the expiration code the same
        because if the data is passed its ttl it will not matter if its a minor or a major compaction from what I can reason but I might be wrong let me know if that needs to be changed also.
        So as of now it will be removed so we do not have to read it again and remove it on the next compaction = less work later theory.

        looks like there is a bug in the rest/Dispatcher.java file current branch-0.2 will not compile clean but I thank my patch will build clean if that error is fixed.

        Yes the minor compaction has the limit of hbase.hstore.compaction.max
        and majorcompactions do not have this limit.

        Show
        Billy Pearson added a comment - ok latest version of patch 834-0.2.1-patchv3.txt Changed force to majorCompaction better name. Passed majorCompaction down to compactHStoreFiles function so we could know not to remove > max version on a minor compaction / incremental compaction This should solve the problem with HBASE-826 . Stack said we do not need to max versions or expiration code on a minor compaction but I left the expiration code the same because if the data is passed its ttl it will not matter if its a minor or a major compaction from what I can reason but I might be wrong let me know if that needs to be changed also. So as of now it will be removed so we do not have to read it again and remove it on the next compaction = less work later theory. looks like there is a bug in the rest/Dispatcher.java file current branch-0.2 will not compile clean but I thank my patch will build clean if that error is fixed. Yes the minor compaction has the limit of hbase.hstore.compaction.max and majorcompactions do not have this limit.
        Hide
        stack added a comment -

        Patch looks good Billy. I haven't tested it because after banging my head against hbase-826, I've learned that this notion of major compaction is a bit more involved than I at first thought (I think you may have known all along how important the difference between minor and major is).

        Here is what I learned. While compacting, if we overrun max versions or a cell has expired, we do not let the cell go through to the compacted file. That was fine in the old days, when we always compacted everything. Since we got smarter compacting – i.e. minor compactions only compacting the small files – this behavior can make for malignant results (See towards end of hbase-826 for an illustration).

        So, Billy, you need to add passing of the 'force' flag down into the HStore#compact (We should probably rename 'force' as 'majorCompaction' or something?). Then in HStore#compact, we only do the max versions and expiration code IF its a major compaction. Otherwise, we just let ALL cells go through to the compacted files (At runtime, the get and scan respect max versions and expiration times).

        I'll be on IRC tomorrow if you want to chat more on this Billy or just write notes into this JIRA and we can back and forth here (If you want, post a rough patch and I can give feedback – that might be best).

        Oh, one other thing, there should be no maximum on the amount of files to compact at a time when doing a major compaction, but I think the way your patch is written, there isn't; its only when minor compactions run that there is a limit – is that so?

        Thanks.

        Show
        stack added a comment - Patch looks good Billy. I haven't tested it because after banging my head against hbase-826, I've learned that this notion of major compaction is a bit more involved than I at first thought (I think you may have known all along how important the difference between minor and major is). Here is what I learned. While compacting, if we overrun max versions or a cell has expired, we do not let the cell go through to the compacted file. That was fine in the old days, when we always compacted everything. Since we got smarter compacting – i.e. minor compactions only compacting the small files – this behavior can make for malignant results (See towards end of hbase-826 for an illustration). So, Billy, you need to add passing of the 'force' flag down into the HStore#compact (We should probably rename 'force' as 'majorCompaction' or something?). Then in HStore#compact, we only do the max versions and expiration code IF its a major compaction. Otherwise, we just let ALL cells go through to the compacted files (At runtime, the get and scan respect max versions and expiration times). I'll be on IRC tomorrow if you want to chat more on this Billy or just write notes into this JIRA and we can back and forth here (If you want, post a rough patch and I can give feedback – that might be best). Oh, one other thing, there should be no maximum on the amount of files to compact at a time when doing a major compaction, but I think the way your patch is written, there isn't; its only when minor compactions run that there is a limit – is that so? Thanks.
        Hide
        Jim Kellerman added a comment -

        Change to blocker. Also move from 0.19 to 0.18

        Show
        Jim Kellerman added a comment - Change to blocker. Also move from 0.19 to 0.18
        Hide
        Billy Pearson added a comment -

        Attached new version with above comments fixed

        Show
        Billy Pearson added a comment - Attached new version with above comments fixed
        Hide
        stack added a comment -

        Oh, one other thing... does the test of whether to do a major compaction have to happen inside the synchronize on this.storefiles? i.e. ' 742 synchronized (storefiles) {' Can it be done outside of this block?

        Show
        stack added a comment - Oh, one other thing... does the test of whether to do a major compaction have to happen inside the synchronize on this.storefiles? i.e. ' 742 synchronized (storefiles) {' Can it be done outside of this block?
        Hide
        stack added a comment -

        I tried the patch. Here's a filtered extract from the logs that just shows the new messages around the major compaction test:

        2008-08-26 04:31:02,419 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
        2008-08-26 04:31:02,420 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1028785192/historian
        2008-08-26 04:31:02,877 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0
        2008-08-26 04:31:02,878 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1028785192/info
        2008-08-26 04:31:08,588 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4
        2008-08-26 04:31:08,588 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 652690253/info
        2008-08-26 04:31:37,237 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4
        2008-08-26 04:31:37,237 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1648250611/info
        2008-08-26 04:31:59,721 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4 
        2008-08-26 04:31:59,721 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 433766857/info
        2008-08-26 04:32:23,407 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4
        2008-08-26 04:32:23,407 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 532635319/info
        2008-08-26 04:32:40,876 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4
        2008-08-26 04:32:40,876 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 478968074/info
        2008-08-26 04:33:16,252 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4
        2008-08-26 04:33:16,252 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 305918941/info
        2008-08-26 04:33:32,483 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4 
        2008-08-26 04:33:32,483 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 629593107/info
        2008-08-26 04:49:28,735 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
        2008-08-26 04:49:28,735 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1028785192/historian
        2008-08-26 04:49:29,218 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
        2008-08-26 04:49:29,218 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1028785192/info
        2008-08-26 04:57:20,395 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
        2008-08-26 04:57:32,731 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
        2008-08-26 04:58:23,362 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
        2008-08-26 04:58:23,362 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1720995599/info
        2008-08-26 04:58:44,441 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
        2008-08-26 04:58:56,754 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
        2008-08-26 04:59:41,982 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 
        ...
        

        We seem to be major compacting too much – even though I'd set major compaction time down to 30 minutes instead of 24 so I could test (Its probably this test 'if (lowTimestamp < System.currentTimeMillis() - majorCompactionTime){
        ' – if lowTimestamp is zero, then we'll major compact).

        We probably shouldn't log if we're returning a zero out of getLowTimestamp method.

        Would also suggest that getLowTimestamp be renamed getLowestTimestamp and moved into HStore from HRegion since its only used there (make it private too?).

        Did you mean to do the below in HRegion Billy?

        @@ -867,7 +889,7 @@
            * @throws IOException
            */
           public byte [] compactStores() throws IOException {
        -    return compactStores(false);
        +	  return compactStores(false);
        
        

        Make the above fixes and I'll try it again Billy. We need this patch.

        Show
        stack added a comment - I tried the patch. Here's a filtered extract from the logs that just shows the new messages around the major compaction test: 2008-08-26 04:31:02,419 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 2008-08-26 04:31:02,420 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1028785192/historian 2008-08-26 04:31:02,877 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 2008-08-26 04:31:02,878 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1028785192/info 2008-08-26 04:31:08,588 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4 2008-08-26 04:31:08,588 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 652690253/info 2008-08-26 04:31:37,237 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4 2008-08-26 04:31:37,237 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1648250611/info 2008-08-26 04:31:59,721 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4 2008-08-26 04:31:59,721 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 433766857/info 2008-08-26 04:32:23,407 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4 2008-08-26 04:32:23,407 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 532635319/info 2008-08-26 04:32:40,876 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4 2008-08-26 04:32:40,876 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 478968074/info 2008-08-26 04:33:16,252 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4 2008-08-26 04:33:16,252 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 305918941/info 2008-08-26 04:33:32,483 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 4 2008-08-26 04:33:32,483 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 629593107/info 2008-08-26 04:49:28,735 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 2008-08-26 04:49:28,735 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1028785192/historian 2008-08-26 04:49:29,218 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 2008-08-26 04:49:29,218 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1028785192/info 2008-08-26 04:57:20,395 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 2008-08-26 04:57:32,731 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 2008-08-26 04:58:23,362 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 2008-08-26 04:58:23,362 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 1720995599/info 2008-08-26 04:58:44,441 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 2008-08-26 04:58:56,754 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 2008-08-26 04:59:41,982 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Hours sense last major compaction: 0 ... We seem to be major compacting too much – even though I'd set major compaction time down to 30 minutes instead of 24 so I could test (Its probably this test 'if (lowTimestamp < System.currentTimeMillis() - majorCompactionTime){ ' – if lowTimestamp is zero, then we'll major compact). We probably shouldn't log if we're returning a zero out of getLowTimestamp method. Would also suggest that getLowTimestamp be renamed getLowestTimestamp and moved into HStore from HRegion since its only used there (make it private too?). Did you mean to do the below in HRegion Billy? @@ -867,7 +889,7 @@ * @ throws IOException */ public byte [] compactStores() throws IOException { - return compactStores( false ); + return compactStores( false ); Make the above fixes and I'll try it again Billy. We need this patch.
        Hide
        Billy Pearson added a comment -

        This is a working patch for 0.2.1 on of my if statments was wrong correct in this patch and added a little more debug logging to show hours sense the last major compaction.
        I would like to see this go in to 0.18.0 verson also so we do not need to patch for .018.0 and 0.19.0
        I believe this will apply to current trunk if you guys want to include it.

        Show
        Billy Pearson added a comment - This is a working patch for 0.2.1 on of my if statments was wrong correct in this patch and added a little more debug logging to show hours sense the last major compaction. I would like to see this go in to 0.18.0 verson also so we do not need to patch for .018.0 and 0.19.0 I believe this will apply to current trunk if you guys want to include it.
        Hide
        Billy Pearson added a comment -

        wait on patch there seams to be a logic error somewhere let me run some more test I am seeing major compactions more often then I should be and not when they should be.

        Show
        Billy Pearson added a comment - wait on patch there seams to be a logic error somewhere let me run some more test I am seeing major compactions more often then I should be and not when they should be.
        Hide
        stack added a comment -

        Patch looks good Billy. Thanks. I want to test on cluster before applying.

        Show
        stack added a comment - Patch looks good Billy. Thanks. I want to test on cluster before applying.
        Hide
        Billy Pearson added a comment -

        Forgot this patch also includes the max files to compact at one time on a minor compaction I set the default to 10.

        Show
        Billy Pearson added a comment - Forgot this patch also includes the max files to compact at one time on a minor compaction I set the default to 10.
        Hide
        Billy Pearson added a comment -

        I thank I got this working correctly I test on my end and it all works ok
        Sense we where doing the minor compaction (incremental compaction) on the HStore level I did the same for the Major Compaction
        I set the default to 1 day fill free to change that to what you guys thank is a correct default.

        2008-08-15 23:34:37,626 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region -ROOT-,,0
        2008-08-15 23:34:37,626 DEBUG org.apache.hadoop.hbase.regionserver.HLog: changing sequence number from 0 to 866762
        2008-08-15 23:34:37,634 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 70236052/info
        2008-08-15 23:34:37,682 DEBUG org.apache.hadoop.hbase.regionserver.HStore: started compaction of 1 files into /hbase/-ROOT-/compaction.dir/70236052/info/mapfiles/5245590111629292638
        2008-08-15 23:34:37,819 DEBUG org.apache.hadoop.hbase.regionserver.HStore: moving /hbase/-ROOT-/compaction.dir/70236052/info/mapfiles/5245590111629292638 to /hbase/-ROOT-/70236052/info/mapfiles/8511958703098935844
        2008-08-15 23:34:37,885 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Completed compaction of 70236052/info store size is 809.0
        2008-08-15 23:34:37,890 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region -ROOT-,,0 in 0sec
        

        Some of my debug code I removed from the patch outputted the timestamps and folder location of the lowTimestamp files. So I could make sure we where checking the correct folder and the timestamps from the files where in mills and everything showed up correctly in the right format.

        Please review and let me know.

        Show
        Billy Pearson added a comment - I thank I got this working correctly I test on my end and it all works ok Sense we where doing the minor compaction (incremental compaction) on the HStore level I did the same for the Major Compaction I set the default to 1 day fill free to change that to what you guys thank is a correct default. 2008-08-15 23:34:37,626 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region -ROOT-,,0 2008-08-15 23:34:37,626 DEBUG org.apache.hadoop.hbase.regionserver.HLog: changing sequence number from 0 to 866762 2008-08-15 23:34:37,634 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Major compaction triggered on store: 70236052/info 2008-08-15 23:34:37,682 DEBUG org.apache.hadoop.hbase.regionserver.HStore: started compaction of 1 files into /hbase/-ROOT-/compaction.dir/70236052/info/mapfiles/5245590111629292638 2008-08-15 23:34:37,819 DEBUG org.apache.hadoop.hbase.regionserver.HStore: moving /hbase/-ROOT-/compaction.dir/70236052/info/mapfiles/5245590111629292638 to /hbase/-ROOT-/70236052/info/mapfiles/8511958703098935844 2008-08-15 23:34:37,885 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Completed compaction of 70236052/info store size is 809.0 2008-08-15 23:34:37,890 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region -ROOT-,,0 in 0sec Some of my debug code I removed from the patch outputted the timestamps and folder location of the lowTimestamp files. So I could make sure we where checking the correct folder and the timestamps from the files where in mills and everything showed up correctly in the right format. Please review and let me know.
        Hide
        Billy Pearson added a comment -

        Changing this to assign to 2.1 and 3.0

        Just noticed we now have a problem of never removing data from (deletes,ttl,max_version) from mapfiles If we never compact all the mapfiles at some point.
        Currently the only way we do is after a split or if the mapfile sizes are just right to include all the mapfile in the incremental compaction.

        Show
        Billy Pearson added a comment - Changing this to assign to 2.1 and 3.0 Just noticed we now have a problem of never removing data from (deletes,ttl,max_version) from mapfiles If we never compact all the mapfiles at some point. Currently the only way we do is after a split or if the mapfile sizes are just right to include all the mapfile in the incremental compaction.
        Hide
        Billy Pearson added a comment -

        HBASE-745 solved the minor compaction with incremental compaction and it still
        can do major compaction's sometimes but not often.

        The only downside to HBASE-745 is it does not guarantee a major compaction to ever happen of the old larger files.
        We do have an option to call the compaction with forced set to true and skip the minor compaction.

        Suggestion to complete the major compaction part

        1. Add a function in HRegion to return the oldest file timestamp of when it was created something like HRegion.getOldestHStoreTimestamp()
        2. Add a option (hbase.hregion.majorcompaction) in the hbase-default.xml setting to make major compaction's to happen every X secs say default 1 per day or a week .
        3. Compare hbase-default.xml against the oldest timestamp in HStore.compact and change from force(false) to force(true) when needed but not in reverse.

        If someone could help with the HRegion.getOldestHStoreTimestamp() function or point me in the right direct on how to do that in hadoop.
        I thank I could come up with a patch to give us a major compaction and add a limit on the number of regions to compact at one time while we are doing the minor compaction.

        Anything I am missing here stack?

        Show
        Billy Pearson added a comment - HBASE-745 solved the minor compaction with incremental compaction and it still can do major compaction's sometimes but not often. The only downside to HBASE-745 is it does not guarantee a major compaction to ever happen of the old larger files. We do have an option to call the compaction with forced set to true and skip the minor compaction. Suggestion to complete the major compaction part 1. Add a function in HRegion to return the oldest file timestamp of when it was created something like HRegion.getOldestHStoreTimestamp() 2. Add a option (hbase.hregion.majorcompaction) in the hbase-default.xml setting to make major compaction's to happen every X secs say default 1 per day or a week . 3. Compare hbase-default.xml against the oldest timestamp in HStore.compact and change from force(false) to force(true) when needed but not in reverse. If someone could help with the HRegion.getOldestHStoreTimestamp() function or point me in the right direct on how to do that in hadoop. I thank I could come up with a patch to give us a major compaction and add a limit on the number of regions to compact at one time while we are doing the minor compaction. Anything I am missing here stack?

          People

          • Assignee:
            Billy Pearson
            Reporter:
            stack
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development