HBase
  1. HBase
  2. HBASE-138

[hbase] Under load, regions become extremely large and eventually cause region servers to become unresponsive

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      When attempting to write to HBase as fast as possible, HBase accepts puts at a reasonably high rate for a while, and then the rate begins to drop off, ultimately culminating in exceptions reaching client code. In my testing, I was able to write about 370 10KB records a second to HBase until I reach around 1 million rows written. At that point, a moderate to large number of exceptions - NotServingRegionException, WrongRegionException, region offline, etc - begin reaching the client code. This appears to be because the retry-and-wait logic in HTable runs out of retries and fails.

      Looking at mapfiles for the regions from the command line shows that some of the mapfiles are between 1 and 2 GB in size, much more than the stated file size limit. Talking with Stack, one possible explanation for this is that the RegionServer is not choosing to compact files often enough, leading to many small mapfiles, which in turn leads to a few overlarge mapfiles. Then, when the time comes to do a split or "major" compaction, it takes an unexpectedly long time to complete these operations. This translates into errors for the client application.

      If I back off the import process and give the cluster some quiet time, some splits and compactions clearly do take place, because the number of regions go up and the number of mapfiles/region goes down. I can then begin writing again in earnest for a short period of time until the problem begins again.

      Both Marc Harris and myself have seen this behavior.

      1. split-v9.patch
        17 kB
        stack
      2. split-v8.patch
        17 kB
        stack
      3. split-v12.patch
        25 kB
        stack
      4. split-v11.patch
        24 kB
        stack
      5. split-v10.patch
        18 kB
        stack
      6. split.patch
        12 kB
        stack

        Issue Links

          Activity

          Hide
          Billy Pearson added a comment -

          A+ Job on this patch guys! This makes hbase much more stable then it was, good job

          Show
          Billy Pearson added a comment - A+ Job on this patch guys! This makes hbase much more stable then it was, good job
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #387 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/387/ )
          Hide
          stack added a comment -

          Committed to TRUNK and backported to 0.16.0.

          Show
          stack added a comment - Committed to TRUNK and backported to 0.16.0.
          Hide
          stack added a comment -

          Marking as a blocker because without it, folks' initial impression of hbase will be extremely negative (without this patch hbase under even moderate load struggles; if load is sustained, hbase can go unresponsive; if the loading completes, hbase cluster will have a few big regions often way in excess of the configured maximum size that will never be broken down).

          Show
          stack added a comment - Marking as a blocker because without it, folks' initial impression of hbase will be extremely negative (without this patch hbase under even moderate load struggles; if load is sustained, hbase can go unresponsive; if the loading completes, hbase cluster will have a few big regions often way in excess of the configured maximum size that will never be broken down).
          Hide
          Bryan Duxbury added a comment -

          I've applied the latest patch and tested again. My import job finished with a minimum of errors. and all my mapfiles are significantly smaller. On top of that, splits are happening much more frequently - 2-4x as often I would say.

          There may still be other issues lurking around this area of functionality, but I would commit this issue.

          +1

          Show
          Bryan Duxbury added a comment - I've applied the latest patch and tested again. My import job finished with a minimum of errors. and all my mapfiles are significantly smaller. On top of that, splits are happening much more frequently - 2-4x as often I would say. There may still be other issues lurking around this area of functionality, but I would commit this issue. +1
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12374583/split-v12.patch
          against trunk revision 616796.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests +1. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1723/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1723/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1723/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1723/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12374583/split-v12.patch against trunk revision 616796. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1723/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1723/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1723/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1723/console This message is automatically generated.
          Hide
          stack added a comment -

          Retrying hudson to get more info on why failed.

          Show
          stack added a comment - Retrying hudson to get more info on why failed.
          Hide
          stack added a comment -

          Patch also includes accomodation of jim review comments removing HConstant change and fixing bad method name.

          Show
          stack added a comment - Patch also includes accomodation of jim review comments removing HConstant change and fixing bad method name.
          Hide
          stack added a comment -

          I don't know why failed in TTI. Tried it locally again and its fine. Enabled mapred logging so can see better why it failed.

          Show
          stack added a comment - I don't know why failed in TTI. Tried it locally again and its fine. Enabled mapred logging so can see better why it failed.
          Hide
          Jim Kellerman added a comment -

          Reviewed patch. +1

          Show
          Jim Kellerman added a comment - Reviewed patch. +1
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12374528/split-v11.patch
          against trunk revision 616796.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests -1. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1720/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1720/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1720/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1720/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12374528/split-v11.patch against trunk revision 616796. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1720/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1720/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1720/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1720/console This message is automatically generated.
          Hide
          stack added a comment -

          Trying hudson again

          Show
          stack added a comment - Trying hudson again
          Hide
          stack added a comment -

          Patch that passes TTMR and TTI. Splits get cleaned up real fast now. Changed the mutl-region maker so it will work even if the parent region has been deleted already by the time it goes looking for it.

          Also, chatting w/ Bryan, the numbers he pastes above were from a run that did not have split-v8.patch in place.

          Show
          stack added a comment - Patch that passes TTMR and TTI. Splits get cleaned up real fast now. Changed the mutl-region maker so it will work even if the parent region has been deleted already by the time it goes looking for it. Also, chatting w/ Bryan, the numbers he pastes above were from a run that did not have split-v8.patch in place.
          Hide
          stack added a comment -

          This version makes use of old property that used to hold the splitcompactor threads wait interval. Using it w/ 20 seconds instead of 15. Means we take on writes slower, but doing math (times flushes take, intervals at which they run, how long it takes a compaction to run generally, etc.), should make things more robust.

          Show
          stack added a comment - This version makes use of old property that used to hold the splitcompactor threads wait interval. Using it w/ 20 seconds instead of 15. Means we take on writes slower, but doing math (times flushes take, intervals at which they run, how long it takes a compaction to run generally, etc.), should make things more robust.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12374498/split-v9.patch
          against trunk revision 616796.

          @author +1. The patch does not contain any @author tags.

          javadoc +1. The javadoc tool did not generate any warning messages.

          javac +1. The applied patch does not generate any new javac compiler warnings.

          release audit +1. The applied patch does not generate any new release audit warnings.

          findbugs +1. The patch does not introduce any new Findbugs warnings.

          core tests +1. The patch passed core unit tests.

          contrib tests -1. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1717/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1717/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1717/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1717/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12374498/split-v9.patch against trunk revision 616796. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1717/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1717/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1717/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1717/console This message is automatically generated.
          Hide
          Bryan Duxbury added a comment -

          I may have spoken too soon. After a bit of a slowdown around 45%, some splits burst through and the writing rate increased back to what I expected it would be.

          Right at the end of the job, I still have a bunch of big mapfiles:

          [rapleaf@tf1 hadoop]$ bin/hadoop dfs -lsr / | grep test_table | grep "mapfiles/[^/]*/data" | grep -v compaction.dir | awk '{print $4}' | sort -n | awk '{print $1 / 1024 / 1024}'
          18.6529
          18.987
          20.3924
          25.5912
          30.4755
          32.5393
          57.0985
          60.0075
          60.2728
          61.7568
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2235
          64.2235
          64.2432
          64.2432
          69.8449
          75.3975
          76.5179
          77.766
          79.3581
          81.8543
          82.6503
          83.0631
          88.94
          90.8564
          92.5664
          97.2247
          101.814
          104.703
          105.116
          110.62
          113.814
          127.543
          128.427
          128.427
          128.427
          128.427
          128.516
          353.175
          367.907
          471.401
          474.664
          575.348
          657.9
          906.067
          921.349
          1578.89
          

          25 minutes after, I've had a few more splits, getting me up to 23 regions overall, with only 40 mapfiles. Some of the files are still much larger than they should be.

          I definitely see this having been an improvement. I don't think it's the whole way yet.

          Show
          Bryan Duxbury added a comment - I may have spoken too soon. After a bit of a slowdown around 45%, some splits burst through and the writing rate increased back to what I expected it would be. Right at the end of the job, I still have a bunch of big mapfiles: [rapleaf@tf1 hadoop]$ bin/hadoop dfs -lsr / | grep test_table | grep "mapfiles/[^/]*/data" | grep -v compaction.dir | awk '{print $4}' | sort -n | awk '{print $1 / 1024 / 1024}' 18.6529 18.987 20.3924 25.5912 30.4755 32.5393 57.0985 60.0075 60.2728 61.7568 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2235 64.2235 64.2432 64.2432 69.8449 75.3975 76.5179 77.766 79.3581 81.8543 82.6503 83.0631 88.94 90.8564 92.5664 97.2247 101.814 104.703 105.116 110.62 113.814 127.543 128.427 128.427 128.427 128.427 128.516 353.175 367.907 471.401 474.664 575.348 657.9 906.067 921.349 1578.89 25 minutes after, I've had a few more splits, getting me up to 23 regions overall, with only 40 mapfiles. Some of the files are still much larger than they should be. I definitely see this having been an improvement. I don't think it's the whole way yet.
          Hide
          Bryan Duxbury added a comment -

          After about 45% of 1 million 10KB rows imported, the import started to slow down markedly. I did a little DFS digging to get a sense of the size of mapfiles:

          [rapleaf@tf1 hadoop]$ bin/hadoop dfs -lsr / | grep test_table | grep "mapfiles/[^/]*/data" | grep -v compaction.dir | awk '{print $4}' | sort -n | awk '{print $1 / 1024 / 1024}'
          0
          0.589743
          21.5422
          29.4829
          36.4409
          36.834
          54.6908
          56.6071
          60.0075
          61.7568
          64
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.2137
          64.3218
          65.3046
          68.1251
          68.9211
          71.2503
          73.2158
          73.9037
          77.5301
          82.1786
          83.0631
          83.1417
          88.94
          92.9497
          98.2762
          111.76
          112.399
          116.162
          119.337
          127.572
          128.496
          657.9
          760.569
          1261.14
          1564.22
          

          (If you can't read awk, that's size in megabytes of each mapfile in the DFS for my test table).

          There's only 7 regions, and the biggest is almost 1.5 GiB. I will report again when the job has completed and the cluster has had a chance to cool down.

          Show
          Bryan Duxbury added a comment - After about 45% of 1 million 10KB rows imported, the import started to slow down markedly. I did a little DFS digging to get a sense of the size of mapfiles: [rapleaf@tf1 hadoop]$ bin/hadoop dfs -lsr / | grep test_table | grep "mapfiles/[^/]*/data" | grep -v compaction.dir | awk '{print $4}' | sort -n | awk '{print $1 / 1024 / 1024}' 0 0.589743 21.5422 29.4829 36.4409 36.834 54.6908 56.6071 60.0075 61.7568 64 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.2137 64.3218 65.3046 68.1251 68.9211 71.2503 73.2158 73.9037 77.5301 82.1786 83.0631 83.1417 88.94 92.9497 98.2762 111.76 112.399 116.162 119.337 127.572 128.496 657.9 760.569 1261.14 1564.22 (If you can't read awk, that's size in megabytes of each mapfile in the DFS for my test table). There's only 7 regions, and the biggest is almost 1.5 GiB. I will report again when the job has completed and the cluster has had a chance to cool down.
          Hide
          stack added a comment -

          Passes tests locally.

          I think this issue a blocker but won't mark it so until Bryan Duxbury test says this patch is an improvement over old behavior (and jimk said he'd review). Meantime, moving it to hudson to make sure its ok w/ him in case we end up committing.

          Show
          stack added a comment - Passes tests locally. I think this issue a blocker but won't mark it so until Bryan Duxbury test says this patch is an improvement over old behavior (and jimk said he'd review). Meantime, moving it to hudson to make sure its ok w/ him in case we end up committing.
          Hide
          stack added a comment -

          v9 adds one line to TestCompaction, setting old config. so it will pass.

          Show
          stack added a comment - v9 adds one line to TestCompaction, setting old config. so it will pass.
          Hide
          stack added a comment -

          Load doesn't have to be extreme.

          Show
          stack added a comment - Load doesn't have to be extreme.
          Hide
          stack added a comment -

          Problem: We have no governor on flushes so its possible – especially after all the performance improvements since 0.15.x – to flush at a rate that overwhelms the rate at which we compact store files

          The attached patch does not solve the overrun problem. It does its best at ameliorating the problem by making splits happen more promptly and then post split, makes it so when region is quiescent, if store files of > 256M – even if only 1 of them – we'll split.

          More detail on patch:

          + HADOOP-2712 set a flag such that if a split was queued, we'd stop further compactions. Turns out that more often than not, it was possible for one last compaction to start before the split ran. This patch puts back together the compacting and splitting thread; a check if split is needed will always follow a com
          paction check (event-driven this was not guaranteed when the threads were distinct). Also made it possible to split though no compaction was run (Removed the 2172 addition. Was too subtle).
          + Flushes could also get in the way of a split so now flushes are blocked too when split is queued.
          + On open, check if region needs to be compacted (Previous this check was only done after first flush – which could be 20/30s out
          + Made it so we split if > 256M, not if > 1.5 * 256M. Set the multiplier on flushes to be 1 instead of 2 so we flush at 64M, not 64M plus some slop. Regularizes split and flushes.
          + Make it so we'll split if only one file > 256M and that we'll compact if only one file but it has references to parent region

          I tried Billy's suggestion of putting a cap on number of mapfiles to compact in the one go. We need to do more work though before we can make use of this suggested technique because regions that hold references are not splittable: I was compacting the N oldest, then on second compactions, would do N oldest again. but the remainder could have references to parent regions and so couldn't be split. Meantime we'd accumulate flush files – the region would never split and the count of flush files would overwhelm the compactor. Need to be smarter and do as Billy suggests and pick up the small files)

          Show
          stack added a comment - Problem: We have no governor on flushes so its possible – especially after all the performance improvements since 0.15.x – to flush at a rate that overwhelms the rate at which we compact store files The attached patch does not solve the overrun problem. It does its best at ameliorating the problem by making splits happen more promptly and then post split, makes it so when region is quiescent, if store files of > 256M – even if only 1 of them – we'll split. More detail on patch: + HADOOP-2712 set a flag such that if a split was queued, we'd stop further compactions. Turns out that more often than not, it was possible for one last compaction to start before the split ran. This patch puts back together the compacting and splitting thread; a check if split is needed will always follow a com paction check (event-driven this was not guaranteed when the threads were distinct). Also made it possible to split though no compaction was run (Removed the 2172 addition. Was too subtle). + Flushes could also get in the way of a split so now flushes are blocked too when split is queued. + On open, check if region needs to be compacted (Previous this check was only done after first flush – which could be 20/30s out + Made it so we split if > 256M, not if > 1.5 * 256M. Set the multiplier on flushes to be 1 instead of 2 so we flush at 64M, not 64M plus some slop. Regularizes split and flushes. + Make it so we'll split if only one file > 256M and that we'll compact if only one file but it has references to parent region I tried Billy's suggestion of putting a cap on number of mapfiles to compact in the one go. We need to do more work though before we can make use of this suggested technique because regions that hold references are not splittable: I was compacting the N oldest, then on second compactions, would do N oldest again. but the remainder could have references to parent regions and so couldn't be split. Meantime we'd accumulate flush files – the region would never split and the count of flush files would overwhelm the compactor. Need to be smarter and do as Billy suggests and pick up the small files)
          Hide
          stack added a comment -

          HADOOP-2636 doesn't seem to help with this problem in particuar.

          Running 8 clients doing PerformanceEvaluation, what I'm looking for is steady number of files to compact on each run.

          Here are first four compactions before the patch:

          2008-01-30 07:03:06,803 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 3 files using hdfs://12.:9123/hbase123/TestTable/compaction.dir for 530893190/info
          2008-01-30 07:04:25,345 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 3 files using hdfs://12.:9123/hbase123/TestTable/compaction.dir for 530893190/info
          2008-01-30 07:06:35,573 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 3 files using hdfs://12.:9123/hbase123/TestTable/compaction.dir for 530893190/info
          2008-01-30 07:11:14,999 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 9 files using hdfs://12.:9123/hbase123/TestTable/compaction.dir for 560724365/info
          

          A split ran in between the 3rd and 4th compaction.

          Here are first four compactions after application of patch:

          2008-01-30 06:43:17,972 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 4 files using hdfs://12.:9123/hbase123/TestTable/compaction.dir for 1984834473/info
          2008-01-30 06:44:54,734 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 3 files using hdfs://12.:9123/hbase123/TestTable/compaction.dir for 1984834473/info
          2008-01-30 06:48:53,389 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 7 files using hdfs://12.:9123/hbase123/TestTable/compaction.dir for 712183868/info
          2008-01-30 06:53:25,746 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 9 files using hdfs://12.:9123/hbase123/TestTable/compaction.dir for 712183868/info
          
          Show
          stack added a comment - HADOOP-2636 doesn't seem to help with this problem in particuar. Running 8 clients doing PerformanceEvaluation, what I'm looking for is steady number of files to compact on each run. Here are first four compactions before the patch: 2008-01-30 07:03:06,803 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 3 files using hdfs: //12.:9123/hbase123/TestTable/compaction.dir for 530893190/info 2008-01-30 07:04:25,345 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 3 files using hdfs: //12.:9123/hbase123/TestTable/compaction.dir for 530893190/info 2008-01-30 07:06:35,573 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 3 files using hdfs: //12.:9123/hbase123/TestTable/compaction.dir for 530893190/info 2008-01-30 07:11:14,999 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 9 files using hdfs: //12.:9123/hbase123/TestTable/compaction.dir for 560724365/info A split ran in between the 3rd and 4th compaction. Here are first four compactions after application of patch: 2008-01-30 06:43:17,972 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 4 files using hdfs: //12.:9123/hbase123/TestTable/compaction.dir for 1984834473/info 2008-01-30 06:44:54,734 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 3 files using hdfs: //12.:9123/hbase123/TestTable/compaction.dir for 1984834473/info 2008-01-30 06:48:53,389 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 7 files using hdfs: //12.:9123/hbase123/TestTable/compaction.dir for 712183868/info 2008-01-30 06:53:25,746 DEBUG org.apache.hadoop.hbase.HStore: started compaction of 9 files using hdfs: //12.:9123/hbase123/TestTable/compaction.dir for 712183868/info
          Hide
          stack added a comment -

          One problem I've seen is that the compaction runs, a split is queued and needs to run because we exceed thresholds but during the compaction, we dumped a bunch of flushes – so many we exceed the compaction threshold and it happens to run again before the split has had a chance to run (2712 made it so this situation does not cascade – splitter will run after the second split but its not enough).

          Patch that puts the compactor and splitter threads back together again so that a new compaction will not run if a split is needed.

          Testing, it needs more work. Putting aside for the moment to see if HADOOP-2636 helps with this issue.

          Show
          stack added a comment - One problem I've seen is that the compaction runs, a split is queued and needs to run because we exceed thresholds but during the compaction, we dumped a bunch of flushes – so many we exceed the compaction threshold and it happens to run again before the split has had a chance to run (2712 made it so this situation does not cascade – splitter will run after the second split but its not enough). Patch that puts the compactor and splitter threads back together again so that a new compaction will not run if a split is needed. Testing, it needs more work. Putting aside for the moment to see if HADOOP-2636 helps with this issue.
          Hide
          Bryan Duxbury added a comment -

          I've been taking a look at the RegionServer side of things to try and understand why a split wouldn't occur. Some code:

          if (e.getRegion().compactIfNeeded()) {
            splitter.splitRequested(e);
          }
          

          We only queue a region to be split if it's just been compacted. I assume the rationale here is that unless a compaction occurred, there'd be no reason to split in the first place. I'm not convinced that's true, however. A store will only compact if it has more mapfiles than the compaction threshold, which in the case of some of my regions, wasn't the case - the individual mapfiles were 1.5GiB, but there were only 2. As a result, compaction and thus splitting was skipped. Shouldn't we be testing to see if the overall size of the mapfiles make splitting necessary, rather than letting the compaction determine whether we do anything?

          Perhaps we should add an optional compaction. Instead of testing HStore.needsCompaction, which only checks if it is above the compaction threshold, maybe we should also have a isCompactable, which just checks if there is more than one mapfile. The optional compacts could happen behind mandatory, threshold-based compacts. Then, we could always put an HStore on the compact queue whenever there is an event that would cause a change to the number of mapfiles, with the constraint that if the store is already on the compact queue, we don't re-add it.

          If we did all of that, then it would probably put us in the right state to keep the split thread doing exactly what it is doing right now, but splits will also happen in downtime.

          Show
          Bryan Duxbury added a comment - I've been taking a look at the RegionServer side of things to try and understand why a split wouldn't occur. Some code: if (e.getRegion().compactIfNeeded()) { splitter.splitRequested(e); } We only queue a region to be split if it's just been compacted. I assume the rationale here is that unless a compaction occurred, there'd be no reason to split in the first place. I'm not convinced that's true, however. A store will only compact if it has more mapfiles than the compaction threshold, which in the case of some of my regions, wasn't the case - the individual mapfiles were 1.5GiB, but there were only 2. As a result, compaction and thus splitting was skipped. Shouldn't we be testing to see if the overall size of the mapfiles make splitting necessary, rather than letting the compaction determine whether we do anything? Perhaps we should add an optional compaction. Instead of testing HStore.needsCompaction, which only checks if it is above the compaction threshold, maybe we should also have a isCompactable, which just checks if there is more than one mapfile. The optional compacts could happen behind mandatory, threshold-based compacts. Then, we could always put an HStore on the compact queue whenever there is an event that would cause a change to the number of mapfiles, with the constraint that if the store is already on the compact queue, we don't re-add it. If we did all of that, then it would probably put us in the right state to keep the split thread doing exactly what it is doing right now, but splits will also happen in downtime.
          Hide
          Billy Pearson added a comment - - edited

          I am talking on a theory I have about hot spots like this over at
          HADOOP-2615

          Take a look and see what you thank about the idea on the New idea on compaction process I had and comment if you like it or have anything better.
          I thank this would solve this problem if we only compacted a few newest map files at a time the splitter would check the region more often to see if it needs split and do so if needed.

          Show
          Billy Pearson added a comment - - edited I am talking on a theory I have about hot spots like this over at HADOOP-2615 Take a look and see what you thank about the idea on the New idea on compaction process I had and comment if you like it or have anything better. I thank this would solve this problem if we only compacted a few newest map files at a time the splitter would check the region more often to see if it needs split and do so if needed.

            People

            • Assignee:
              Unassigned
              Reporter:
              Bryan Duxbury
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development