HBase
  1. HBase
  2. HBASE-64

Add max number of mapfiles to compact at one time giveing us a minor & major compaction

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Later
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: regionserver
    • Labels:
      None

      Description

      Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached - default 3

      I thank we should configure a max number of mapfiles to compact at one time simulator to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one region to long letting other regions get way to many memcache flushes making compaction take longer and longer for each region

      If we did that when a regions updates start to slack off the max number will eventuly include all mapfiles causeing a major compaction on that region. Unlike big table this would leave the master out of the process and letting the region server handle the major compaction when it has time.

      When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.

      1. twice.patch
        1 kB
        stack
      2. flag.patch
        2 kB
        stack
      3. flag-v2.patch
        4 kB
        stack

        Issue Links

          Activity

          Hide
          Billy Pearson added a comment -

          I thank we need to watch and make sure the split check is not launched on a minor compaction.

          Example:
          Say we have a new region fresh split and it starts getting heave updates and memcache flushes. If we have not compacted the old hstors form the original split then it would not likely to be wise to split again.

          Show
          Billy Pearson added a comment - I thank we need to watch and make sure the split check is not launched on a minor compaction. Example: Say we have a new region fresh split and it starts getting heave updates and memcache flushes. If we have not compacted the old hstors form the original split then it would not likely to be wise to split again.
          Hide
          Jim Kellerman added a comment -

          Blocking this issue to see if HADOOP-2636 resolves the issue

          Show
          Jim Kellerman added a comment - Blocking this issue to see if HADOOP-2636 resolves the issue
          Hide
          stack added a comment -

          Doing some load testing, I'm seeing compactions taking longer and longer as reported above but I'm also seeing that the region won't split. Just goes from one compaction to the next w/ each doing more and more files taking longer each time.

          2008-01-25 01:16:45,850 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region TestTable,,1201223677355. Took 55sec
          2008-01-25 01:18:37,347 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region TestTable,,1201223677355. Took 1mins, 49sec
          2008-01-25 01:21:42,010 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region TestTable,,1201223677355. Took 3mins, 4sec
          2008-01-25 01:27:20,417 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region TestTable,,1201223677355. Took 5mins, 38sec
          2008-01-25 01:37:55,330 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region TestTable,,1201223677355. Took 10mins, 34sec
          

          Looking more into this.

          Show
          stack added a comment - Doing some load testing, I'm seeing compactions taking longer and longer as reported above but I'm also seeing that the region won't split. Just goes from one compaction to the next w/ each doing more and more files taking longer each time. 2008-01-25 01:16:45,850 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region TestTable,,1201223677355. Took 55sec 2008-01-25 01:18:37,347 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region TestTable,,1201223677355. Took 1mins, 49sec 2008-01-25 01:21:42,010 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region TestTable,,1201223677355. Took 3mins, 4sec 2008-01-25 01:27:20,417 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region TestTable,,1201223677355. Took 5mins, 38sec 2008-01-25 01:37:55,330 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region TestTable,,1201223677355. Took 10mins, 34sec Looking more into this.
          Hide
          stack added a comment -

          Here's the next two compactions:

          2008-01-25 01:58:30,280 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region TestTable,,1201223677355. Took 20mins, 34sec
          2008-01-25 02:17:28,818 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region TestTable,,1201223677355. Took 18mins, 58sec
          

          The split final runs, 15 minutes after the loading finishes and an hour after its initially scheduled. Split is frozen out waiting here in HRegion:

                  while (writestate.compacting || writestate.flushing) {
                    LOG.debug("waiting for" +
                        (writestate.compacting ? " compaction" : "") +
                        (writestate.flushing ?
                            (writestate.compacting ? "," : "") + " cache flush" :
                              ""
                        ) + " to complete for region " + regionName
                    );
                    try {
                      writestate.wait();
                    } catch (InterruptedException iex) {
                      // continue
                    }
                  }
          
          Show
          stack added a comment - Here's the next two compactions: 2008-01-25 01:58:30,280 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region TestTable,,1201223677355. Took 20mins, 34sec 2008-01-25 02:17:28,818 INFO org.apache.hadoop.hbase.HRegion: compaction completed on region TestTable,,1201223677355. Took 18mins, 58sec The split final runs, 15 minutes after the loading finishes and an hour after its initially scheduled. Split is frozen out waiting here in HRegion: while (writestate.compacting || writestate.flushing) { LOG.debug( "waiting for " + (writestate.compacting ? " compaction" : "") + (writestate.flushing ? (writestate.compacting ? "," : "") + " cache flush" : "" ) + " to complete for region " + regionName ); try { writestate.wait(); } catch (InterruptedException iex) { // continue } }
          Hide
          stack added a comment -

          Patch that only compacts a maximum of twice the compaction threshold. Testing to see if split thread gets a chance to run when we have upper bound on number of files to compact per invocation

          Show
          stack added a comment - Patch that only compacts a maximum of twice the compaction threshold. Testing to see if split thread gets a chance to run when we have upper bound on number of files to compact per invocation
          Hide
          Billy Pearson added a comment -

          Thats what I see too the split never happens when a region is under load of inserts. I still thank if we are going to have transactions speed close to bigtables we will need to add a limit on number of map files to compaction at one time.
          Even if HADOOP-2636 get the flushing working right for performance point of view I thank it should be included as any ways to handle large number of regions per server.

          I am seeing 10-15 mins to run compaction on a 90MB region using block compression.
          So if you consider that most will want to handed more then 25-50 regions per server.

          Say avg region server holds 100 regions thats going to work out to be 100*10mins = 1000 mins = 16hours to run a full compaction on all the regions.
          By havening this in place on regions getting large update traffic the map files will not get out of control.

          100 regions with 90MB avg size only equals about 9GB of compressed data.
          I would like to see closer to production release better compression method used.
          This would help with compaction speed right now my bottle neck on compaction is compression.

          {New Idea}

          After thinking on this a little not sure doing a compaction on the number of map files it the best way to go.
          Compaction on 3-6 small 1-2mb map files does not take that long even with compression so the idea way to do this would be to only
          compaction small files while we have small files to compaction leaving more larger map files to compact in the end when load is as high.

          big tables has the right idea only do a full/major compaction of all the map files every so often to remove deleted data or data out of its max version range.
          so we might want to look at the idea of removing the compaction based on the number of map files to a limit on the size of the map files
          example say we have a region family compaction max size 16MB we could only compact files under that size once we compact regions that total more then the
          max compaction size then do not include that map file in the next compaction. This would leave map files around the same size to be compacted together say once a day and/or after splits.
          also I would like to keep the region server handle the compaction on there own so the master can be left alone to do other more important task.

          Currently if you load a region server with many regions it always be running compaction's on the regions if there getting data inserted.
          So this would lesses the load on the hard drives, memory, and cpus giving more resources for faster/more transactions.

          Show
          Billy Pearson added a comment - Thats what I see too the split never happens when a region is under load of inserts. I still thank if we are going to have transactions speed close to bigtables we will need to add a limit on number of map files to compaction at one time. Even if HADOOP-2636 get the flushing working right for performance point of view I thank it should be included as any ways to handle large number of regions per server. I am seeing 10-15 mins to run compaction on a 90MB region using block compression. So if you consider that most will want to handed more then 25-50 regions per server. Say avg region server holds 100 regions thats going to work out to be 100*10mins = 1000 mins = 16hours to run a full compaction on all the regions. By havening this in place on regions getting large update traffic the map files will not get out of control. 100 regions with 90MB avg size only equals about 9GB of compressed data. I would like to see closer to production release better compression method used. This would help with compaction speed right now my bottle neck on compaction is compression. {New Idea} After thinking on this a little not sure doing a compaction on the number of map files it the best way to go. Compaction on 3-6 small 1-2mb map files does not take that long even with compression so the idea way to do this would be to only compaction small files while we have small files to compaction leaving more larger map files to compact in the end when load is as high. big tables has the right idea only do a full/major compaction of all the map files every so often to remove deleted data or data out of its max version range. so we might want to look at the idea of removing the compaction based on the number of map files to a limit on the size of the map files example say we have a region family compaction max size 16MB we could only compact files under that size once we compact regions that total more then the max compaction size then do not include that map file in the next compaction. This would leave map files around the same size to be compacted together say once a day and/or after splits. also I would like to keep the region server handle the compaction on there own so the master can be left alone to do other more important task. Currently if you load a region server with many regions it always be running compaction's on the regions if there getting data inserted. So this would lesses the load on the hard drives, memory, and cpus giving more resources for faster/more transactions.
          Hide
          stack added a comment -

          This is a better patch. Moves the setting of writesEnabled before we go into the wait. But its still not right. We start to split but writesEnabled also stops flushing so memcache fills, the block updates gate comes down, clients timout. Need a flag that stops another compaction running but lets flushes happen.

          Show
          stack added a comment - This is a better patch. Moves the setting of writesEnabled before we go into the wait. But its still not right. We start to split but writesEnabled also stops flushing so memcache fills, the block updates gate comes down, clients timout. Need a flag that stops another compaction running but lets flushes happen.
          Hide
          stack added a comment -

          Add a new disable compactions. Gets set when trying to close a region so no more compactions can be scheduled. Tested under load and splits happen again.

          Show
          stack added a comment - Add a new disable compactions. Gets set when trying to close a region so no more compactions can be scheduled. Tested under load and splits happen again.
          Hide
          stack added a comment -

          I made issue HADOOP-2712 to cover not-splitting under load.

          Billy, in bigtable paper, I believe what we call a flush is a minor compaction in gwhogle-speak and a merging compaction is what they call compaction of a few store files interleaving whats in memcache.

          .bq When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region.

          Why you think newer rather than older Billy?

          .bq I still thank if we are going to have transactions speed close to bigtables we will need to add a limit on number of map files to compaction at one time.

          I agree given the times to compact posted above.

          By the way, I tried out my simple upper-bound patch that put a cap of 2*compactionThreshold on number of files to compact at once. Seems to work with messages like below showing from time to time:

          2008-01-25 20:44:38,330 DEBUG org.apache.hadoop.hbase.HStore: Count of files to compact in 2052803679/info is 8 which is > twice compaction threshold of 3. Compacting 6 only
          

          FYI, regionserver runs compaction. Master has no say at moment.

          Show
          stack added a comment - I made issue HADOOP-2712 to cover not-splitting under load. Billy, in bigtable paper, I believe what we call a flush is a minor compaction in gwhogle-speak and a merging compaction is what they call compaction of a few store files interleaving whats in memcache. .bq When doing a minor compaction on a few files I thank we should compact the newest mapfiles first leave the larger/older ones for when we have low updates to a region. Why you think newer rather than older Billy? .bq I still thank if we are going to have transactions speed close to bigtables we will need to add a limit on number of map files to compaction at one time. I agree given the times to compact posted above. By the way, I tried out my simple upper-bound patch that put a cap of 2*compactionThreshold on number of files to compact at once. Seems to work with messages like below showing from time to time: 2008-01-25 20:44:38,330 DEBUG org.apache.hadoop.hbase.HStore: Count of files to compact in 2052803679/info is 8 which is > twice compaction threshold of 3. Compacting 6 only FYI, regionserver runs compaction. Master has no say at moment.
          Hide
          Billy Pearson added a comment -

          Example for high number of updates on hot regions

          Say I have many regions say 100 on a server and a few are getting a lot of updates and the others are getting some updates.

          The few that are getting the bulk of the updates will have many map files causing the compaction to limit to the 6 map file limit but will be quick to finish sense we will only be working with say 16-20MB not 64MB+.
          If we go from new to old leaveing out the oldest map files that are the largest and take the longest to include in compaction.

          The other regions will still tie up the compaction thread for say 10 mins each on regions that only has 3-4 map files becuase it will include the larger map files for compaction.
          In that time the two region that are getting lots of updates will be flushing more often meaning they will have many map files.
          We will be spending most of our time compaction region that have only a few map files including the larger map files that take the longest our time to compact instead of on the region that have the most map files to compact.

          In my example above if all/most the regions flushed a map file and entered in to the que for compaction it would be 16 hours before we got back to the
          few regions that had been getting the bulk of the updates. Then when we got back to them we would only be processing 6 of the map files again then leavening many map files for the next compaction
          looping and doing all the others again assuming they will get a few flushes over the 16 hours it took to complete the cmopaction on all the regions.

          We should try to come up with a simple test outside of Hudson to see real numbers on the time it takes to do a scan on a region say run the test with 10,20,100,500 map files.

          With my new idea above we could keep the number of map files under control by only running compaction map files under X size keeping the compaction fast. the test many
          This may show that we can handle say 50 medium size map files during a scan with out much impact on speed. if this is the case then we may not have to do major compaction's where we merge all the map files together but once every few days. Except on a split then we would want to do a major compaction soon to remove the out of range data from each new region.

          The bottle neck I have seen on compaction is with block compress we am bound by the cpu speed to gzip the map file after compaction. So I would rather run one large compaction once a day or two and have to gzip the biggest part of each region only every few days in place of doing every day or more then once a day. In my mind thats wasting resources gunzip 64MB data add 4MB gzip it. and do this many times a day I thank thats wasting cpu time on gziping the same data over and over again

          My idea here is to spend more time on compaction on regions getting more updates then the other regions so we can handle more regions per server.
          Currently my example above is based on 100 regions totaling 9GB of compressed data. With that kind of number per server someone wanting to store a TB of compressed data in hbase they would need a vary large number of servers or have low update traffic.

          I know we have some other issues on how many regions a server can handle with the open files limits per server and stuff like that but I would like to see this compaction problem fix once and have the most efficient compaction we can for all users and removing it from becoming a issue later down the road. In the end if we go with this new idea it would mean that the compaction's would be faster and use less resources during bulk updates and allow more resources to other task running on the server like map task.

          So my proposed idea would be to have two types of compaction's

          1. Compact new flushes in to one map files until it reach a size in MB's then leave it for compaction below
          2. Compact all map files for a region to together once every x days or if we are child region from a split.

          Show
          Billy Pearson added a comment - Example for high number of updates on hot regions Say I have many regions say 100 on a server and a few are getting a lot of updates and the others are getting some updates. The few that are getting the bulk of the updates will have many map files causing the compaction to limit to the 6 map file limit but will be quick to finish sense we will only be working with say 16-20MB not 64MB+. If we go from new to old leaveing out the oldest map files that are the largest and take the longest to include in compaction. The other regions will still tie up the compaction thread for say 10 mins each on regions that only has 3-4 map files becuase it will include the larger map files for compaction. In that time the two region that are getting lots of updates will be flushing more often meaning they will have many map files. We will be spending most of our time compaction region that have only a few map files including the larger map files that take the longest our time to compact instead of on the region that have the most map files to compact. In my example above if all/most the regions flushed a map file and entered in to the que for compaction it would be 16 hours before we got back to the few regions that had been getting the bulk of the updates. Then when we got back to them we would only be processing 6 of the map files again then leavening many map files for the next compaction looping and doing all the others again assuming they will get a few flushes over the 16 hours it took to complete the cmopaction on all the regions. We should try to come up with a simple test outside of Hudson to see real numbers on the time it takes to do a scan on a region say run the test with 10,20,100,500 map files. With my new idea above we could keep the number of map files under control by only running compaction map files under X size keeping the compaction fast. the test many This may show that we can handle say 50 medium size map files during a scan with out much impact on speed. if this is the case then we may not have to do major compaction's where we merge all the map files together but once every few days. Except on a split then we would want to do a major compaction soon to remove the out of range data from each new region. The bottle neck I have seen on compaction is with block compress we am bound by the cpu speed to gzip the map file after compaction. So I would rather run one large compaction once a day or two and have to gzip the biggest part of each region only every few days in place of doing every day or more then once a day. In my mind thats wasting resources gunzip 64MB data add 4MB gzip it. and do this many times a day I thank thats wasting cpu time on gziping the same data over and over again My idea here is to spend more time on compaction on regions getting more updates then the other regions so we can handle more regions per server. Currently my example above is based on 100 regions totaling 9GB of compressed data. With that kind of number per server someone wanting to store a TB of compressed data in hbase they would need a vary large number of servers or have low update traffic. I know we have some other issues on how many regions a server can handle with the open files limits per server and stuff like that but I would like to see this compaction problem fix once and have the most efficient compaction we can for all users and removing it from becoming a issue later down the road. In the end if we go with this new idea it would mean that the compaction's would be faster and use less resources during bulk updates and allow more resources to other task running on the server like map task. So my proposed idea would be to have two types of compaction's 1. Compact new flushes in to one map files until it reach a size in MB's then leave it for compaction below 2. Compact all map files for a region to together once every x days or if we are child region from a split.
          Hide
          Billy Pearson added a comment -

          noting I am not thanking of havening two compaction threads just two options a compaction can take we still que up a compaction check on a memcache flush but
          run compaction #2 above if oldest map file is > X days else check to see if there are 2 or more new map files less then the max compaction MB set and run #1 if true.

          Show
          Billy Pearson added a comment - noting I am not thanking of havening two compaction threads just two options a compaction can take we still que up a compaction check on a memcache flush but run compaction #2 above if oldest map file is > X days else check to see if there are 2 or more new map files less then the max compaction MB set and run #1 if true.
          Hide
          Bryan Duxbury added a comment -

          It appears these two issues are probably related. I actually thought that this one had been fixed and committed already, but I guess not.

          Show
          Bryan Duxbury added a comment - It appears these two issues are probably related. I actually thought that this one had been fixed and committed already, but I guess not.
          Hide
          Billy Pearson added a comment -

          I got a second ideal on this to help with hot spots

          If we could add a way to set a priority for compaction's this would help with the hot spots regions building up to many map files flushes.

          Example if we have a region with 25 map files and one with 10

          Region with 25 map files would have a priority of 25
          and the one with 10 map files have a priority of 10 we would compact the region with 25 before 10

          If we could add/update the priority when we do a flush then the compactor could work on region that need it the most in order.

          Show
          Billy Pearson added a comment - I got a second ideal on this to help with hot spots If we could add a way to set a priority for compaction's this would help with the hot spots regions building up to many map files flushes. Example if we have a region with 25 map files and one with 10 Region with 25 map files would have a priority of 25 and the one with 10 map files have a priority of 10 we would compact the region with 25 before 10 If we could add/update the priority when we do a flush then the compactor could work on region that need it the most in order.
          Hide
          stack added a comment -

          There is a bunch of good stuff above but its hard to tell which still applies since HBASE-745 went in. There is also some overlap with HBASE-775

          Show
          stack added a comment - There is a bunch of good stuff above but its hard to tell which still applies since HBASE-745 went in. There is also some overlap with HBASE-775
          Hide
          Billy Pearson added a comment -

          I thank HBASE-745 did basically what I was looking for above when HBASE-775 is resolved this issue should be closed.

          the only other item above or I had idea of was changeing the order of compaction to the one with the most map files first.
          With the speed gain from HBASE-745 I do not thank that is needed even if we had 100 regions per server compaction would keep up now I thank.

          So unless anyone else is seeing backlogs in the compaction's now I thank the compaction's bottlenecks have been solved.

          Show
          Billy Pearson added a comment - I thank HBASE-745 did basically what I was looking for above when HBASE-775 is resolved this issue should be closed. the only other item above or I had idea of was changeing the order of compaction to the one with the most map files first. With the speed gain from HBASE-745 I do not thank that is needed even if we had 100 regions per server compaction would keep up now I thank. So unless anyone else is seeing backlogs in the compaction's now I thank the compaction's bottlenecks have been solved.
          Hide
          stack added a comment -

          Opened HBASE-834 to cover original intent of this issue. Closing this one since it got pulled all around the place.

          Show
          stack added a comment - Opened HBASE-834 to cover original intent of this issue. Closing this one since it got pulled all around the place.

            People

            • Assignee:
              Unassigned
              Reporter:
              Billy Pearson
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development