HBase
  1. HBase
  2. HBASE-10501

Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.96.2, 0.98.1, 0.99.0, 0.94.17
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Changes the default split policy to avoid too many regions with default settings.
      The old policy calculates the split size at each RS as follows: MIN(maxFileSize, flushSize*NoRegions^2) (NoRegions is the number of regions for the table in question seen on this RS)

      The new policy calculates the size this way: MIN(maxFileSize, flushSize*2*NoRegions^3)
      Note that the initial split size is now 2 * the flushSize. With default settings it increased from 128mb to 256mb.

      The new policy still allows spreading out the regions over the cluster quickly, but then grows the desired size fairly quickly in order to avoid too many regions per RS.
      Show
      Changes the default split policy to avoid too many regions with default settings. The old policy calculates the split size at each RS as follows: MIN(maxFileSize, flushSize*NoRegions^2) (NoRegions is the number of regions for the table in question seen on this RS) The new policy calculates the size this way: MIN(maxFileSize, flushSize*2*NoRegions^3) Note that the initial split size is now 2 * the flushSize. With default settings it increased from 128mb to 256mb. The new policy still allows spreading out the regions over the cluster quickly, but then grows the desired size fairly quickly in order to avoid too many regions per RS.

      Description

      During some (admittedly artificial) load testing we found a large amount split activity, which we tracked down the IncreasingToUpperBoundRegionSplitPolicy.

      The current logic is this (from the comments):
      "regions that are on this server that all are of the same table, squared, times the region flush size OR the maximum region split size, whichever is smaller"

      So with a flush size of 128mb and max file size of 20gb, we'd need 13 region of the same table on an RS to reach the max size.
      With 10gb file sized it is still 9 regions of the same table.
      Considering that the number of regions that an RS can carry is limited and there might be multiple tables, this should be more configurable.

      I think the squaring is smart and we do not need to change it.

      We could

      • Make the start size configurable and default it to the flush size
      • Add multiplier for the initial size, i.e. start with n * flushSize
      • Also change the default to start with 2*flush size

      Of course one can override the default split policy, but these seem like simple tweaks.

      Or we could instead set the goal of how many regions of the same table would need to be present in order to reach the max size. In that case we'd start with maxSize/goal^2. So if max size is 20gb and the goal is three we'd start with 20g/9 = 2.2g for the initial region size.

      stack, I'm especially interested in your opinion.

      1. 10501-0.94-v5.txt
        5 kB
        Lars Hofhansl
      2. 10501-trunk-v2.txt
        5 kB
        Lars Hofhansl
      3. 10501-trunk.txt
        7 kB
        Lars Hofhansl
      4. 10501-0.94-v4.txt
        3 kB
        Lars Hofhansl
      5. 10501-0.94-v3.txt
        3 kB
        Lars Hofhansl
      6. 10501-0.94-v2.txt
        3 kB
        Lars Hofhansl
      7. 10501-0.94.txt
        2 kB
        Lars Hofhansl

        Activity

        Lars Hofhansl created issue -
        Lars Hofhansl made changes -
        Field Original Value New Value
        Description During some (admittedly) artificial load testing we found a large amount split activity, which we tracked down the IncreasingToUpperBoundRegionSplitPolicy.

        The current logic is this (from the comment)
        "regions that are on this server that all are of the same table, squared, times the region flush size OR the maximum region split size, whichever is smaller"

        So with a flush size of 128mb and max file size of 20gb, we'd need 13 region of the same table on an RS to reach the max size.
        With 10gb file sized it is still 9 regions of the same table.
        Considering that the number of regions that an RS can carry is limited and might be multiple tables, this should be more configurable.

        I think the squaring is smart and we do not need to change it.

        We could
        * Make the start size configurable and default it to the flush size
        * Add multiplier for the initial size, i.e. start with n * flushSize

        Of course one can override the default split policy, but these seem like simple tweaks.

        Or we could instead set the goal of how many regions of the same table would need to be present in order to reach the max size. In that case we'd start with maxSize/goal^2. So if max size is 20gb and the goal is three we'd start with 20g/9 = 2.2g for the initial region size.

        [~stack], I'm interested in your opinion.
        During some (admittedly artificial) load testing we found a large amount split activity, which we tracked down the IncreasingToUpperBoundRegionSplitPolicy.

        The current logic is this (from the comments):
        "regions that are on this server that all are of the same table, squared, times the region flush size OR the maximum region split size, whichever is smaller"

        So with a flush size of 128mb and max file size of 20gb, we'd need 13 region of the same table on an RS to reach the max size.
        With 10gb file sized it is still 9 regions of the same table.
        Considering that the number of regions that an RS can carry is limited and there might be multiple tables, this should be more configurable.

        I think the squaring is smart and we do not need to change it.

        We could
        * Make the start size configurable and default it to the flush size
        * Add multiplier for the initial size, i.e. start with n * flushSize
        * Also change the default to start with 2*flush size

        Of course one can override the default split policy, but these seem like simple tweaks.

        Or we could instead set the goal of how many regions of the same table would need to be present in order to reach the max size. In that case we'd start with maxSize/goal^2. So if max size is 20gb and the goal is three we'd start with 20g/9 = 2.2g for the initial region size.

        [~stack], I'm especially interested in your opinion.
        Hide
        stack added a comment -

        The original motivation was split quickly, initially, so regions got farmed out across the cluster sooner so full cluster got in on the action faster than if say, you had to wait till one region hit a max size.

        Was also hoping to keep the logic simple – easy to understand – and was trying to make it so you'd not have to touch this splitter going forward.

        As you point out 13 splits to reach max seems too many especially if many tables on the one server. Suggestions for when to split the first time all seem fine – just pick one I'd say.

        What if we tripled rather than doubled?

        Show
        stack added a comment - The original motivation was split quickly, initially, so regions got farmed out across the cluster sooner so full cluster got in on the action faster than if say, you had to wait till one region hit a max size. Was also hoping to keep the logic simple – easy to understand – and was trying to make it so you'd not have to touch this splitter going forward. As you point out 13 splits to reach max seems too many especially if many tables on the one server. Suggestions for when to split the first time all seem fine – just pick one I'd say. What if we tripled rather than doubled?
        Hide
        Lars Hofhansl added a comment -

        You mean do number of regions ^ 3? Or start with 3*flushsize?

        Flush size might just be too small as initial size, it is set due to memory considerations, whereas the max size is set due to disk IO considerations. Until we start to see multiple regions of a table on an RS we'd split on each flush.

        So many options

        Show
        Lars Hofhansl added a comment - You mean do number of regions ^ 3? Or start with 3*flushsize? Flush size might just be too small as initial size, it is set due to memory considerations, whereas the max size is set due to disk IO considerations. Until we start to see multiple regions of a table on an RS we'd split on each flush. So many options
        Hide
        Lars Hofhansl added a comment - - edited

        Starting with flush size means that every flush will cause a split until we see multiple regions of a table on an RS.
        Maybe just starting with 2*flushsize would avert that.

        Show
        Lars Hofhansl added a comment - - edited Starting with flush size means that every flush will cause a split until we see multiple regions of a table on an RS. Maybe just starting with 2*flushsize would avert that.
        Hide
        Lars Hofhansl added a comment -

        Here's a trivial patch allowing to set the initial size.

        Or is simpler to tweak default to 2*flushsize?

        Show
        Lars Hofhansl added a comment - Here's a trivial patch allowing to set the initial size. Or is simpler to tweak default to 2*flushsize?
        Lars Hofhansl made changes -
        Attachment 10501-0.94.txt [ 12628381 ]
        Hide
        Lars Hofhansl added a comment -

        Thinking about this a bit more. It is probably fairly unlikely that an RS will see 3 or 4 regions of the same table, unless that table is already spread over the cluster. So I think we should ensure we're pretty close to the max file when we see that many regions.
        In most setups the ration of maxFileSize to flushSize is somewhere around 100:1. (10g regions and 128mb flushsize makes this 80).

        So if we start with 3*flushsize we get an initial 30:1 ratio and thus we'd reach the max at about 5 (sqrt(30) = 5.4...)
        That would also avoid the initial split storm after each flush.

        Should I change the initial size to 3 or 4 x the flush size, instead of making it configurable at all?

        Show
        Lars Hofhansl added a comment - Thinking about this a bit more. It is probably fairly unlikely that an RS will see 3 or 4 regions of the same table, unless that table is already spread over the cluster. So I think we should ensure we're pretty close to the max file when we see that many regions. In most setups the ration of maxFileSize to flushSize is somewhere around 100:1. (10g regions and 128mb flushsize makes this 80). So if we start with 3*flushsize we get an initial 30:1 ratio and thus we'd reach the max at about 5 (sqrt(30) = 5.4...) That would also avoid the initial split storm after each flush. Should I change the initial size to 3 or 4 x the flush size, instead of making it configurable at all?
        Hide
        Lars Hofhansl added a comment -

        OK, so here's my proposal:

        • By default we set the initial size to 2*flushSize
        • Allow to configure the initial size

        Starting with flush size just makes for a bad experience when folks just play around with loading a bunch of data the first time. It leads to too aggressive splitting. Just adding the factor of two here means that not every flush will result in a split until we see multiple regions per RS.

        Show
        Lars Hofhansl added a comment - OK, so here's my proposal: By default we set the initial size to 2*flushSize Allow to configure the initial size Starting with flush size just makes for a bad experience when folks just play around with loading a bunch of data the first time. It leads to too aggressive splitting. Just adding the factor of two here means that not every flush will result in a split until we see multiple regions per RS.
        Lars Hofhansl made changes -
        Attachment 10501-0.94-v2.txt [ 12628952 ]
        Hide
        stack added a comment -

        lgtm. When you do your little squaring math, its sqrt(50) which is 7? Still too much? cuberoot is between 3 and 4. Would that be better?

        Show
        stack added a comment - lgtm. When you do your little squaring math, its sqrt(50) which is 7? Still too much? cuberoot is between 3 and 4. Would that be better?
        Hide
        Lars Hofhansl added a comment -

        Cubing could be better indeed. Was thinking not to change too much. But maybe we should.

        So with initialSize = 2*flushSize and cubing we'd get the following by default (128m memstores, 10g regions):
        256m, 2048m, 6912m, 10g
        With squaring we'd get
        256m, 1024m, 2304m, 4096m, 6400m, 9216m, 10g
        With 4*flushsize and squaring it's:
        512m, 2048m, 4608m, 8192m, 10g

        Note sure. Looks like 2*flushSize + cubing is best. When cluster is sparsely used we spread quickly, but also grow quickly once we start seeing multiple regions. Let's do that then?
        As I said this is fuzzy and there is not right or wrong

        Do we have to worry about numerical overflow? We'd blow past 2^63 after a few 1000 regions depending on flush size. Maybe clamp to max file size after 100 regions.

        One bit of information. In our test on a 9 RS/DN cluster we loaded 1bn rows/250gb and ended up with 171 regions. I.e. 1.4g on average and 19 per region server. Definitely not good - and we have 256mb flush size and 10g max file size. Now, 250gb is not exactly a lot of data, but it illustrates the point.
        (It's much higher than our math here; presumably because some of the RS have fewer regions at times, so they split faster even)

        Maybe I can get our perf folks to do some testing.

        Show
        Lars Hofhansl added a comment - Cubing could be better indeed. Was thinking not to change too much. But maybe we should. So with initialSize = 2*flushSize and cubing we'd get the following by default (128m memstores, 10g regions): 256m, 2048m, 6912m, 10g With squaring we'd get 256m, 1024m, 2304m, 4096m, 6400m, 9216m, 10g With 4*flushsize and squaring it's: 512m, 2048m, 4608m, 8192m, 10g Note sure. Looks like 2*flushSize + cubing is best. When cluster is sparsely used we spread quickly, but also grow quickly once we start seeing multiple regions. Let's do that then? As I said this is fuzzy and there is not right or wrong Do we have to worry about numerical overflow? We'd blow past 2^63 after a few 1000 regions depending on flush size. Maybe clamp to max file size after 100 regions. One bit of information. In our test on a 9 RS/DN cluster we loaded 1bn rows/250gb and ended up with 171 regions. I.e. 1.4g on average and 19 per region server. Definitely not good - and we have 256mb flush size and 10g max file size. Now, 250gb is not exactly a lot of data, but it illustrates the point. (It's much higher than our math here; presumably because some of the RS have fewer regions at times, so they split faster even) Maybe I can get our perf folks to do some testing.
        Hide
        Enis Soztutar added a comment -

        +1 for changing the default. 2*flushSize + cube looks good.
        I think we should also go for more global split decisions, rather than local decisions. But this is for another issue.

        Show
        Enis Soztutar added a comment - +1 for changing the default. 2*flushSize + cube looks good. I think we should also go for more global split decisions, rather than local decisions. But this is for another issue.
        Hide
        Lars Hofhansl added a comment -

        Marking critical as the current default will lead to a bad experience for everyone using HBase the first time and loading a lot of data.

        Show
        Lars Hofhansl added a comment - Marking critical as the current default will lead to a bad experience for everyone using HBase the first time and loading a lot of data.
        Lars Hofhansl made changes -
        Priority Major [ 3 ] Critical [ 2 ]
        Hide
        Lars Hofhansl added a comment -

        v3

        • cubes the count
        • starts with 2 x flushSize
        • makes initial size configurable

        Please have a look. Now that the first 0.94.17 RC is sunk I'd like pull this into 0.94.17.

        Show
        Lars Hofhansl added a comment - v3 cubes the count starts with 2 x flushSize makes initial size configurable Please have a look. Now that the first 0.94.17 RC is sunk I'd like pull this into 0.94.17.
        Lars Hofhansl made changes -
        Attachment 10501-0.94-v3.txt [ 12629276 ]
        Lars Hofhansl made changes -
        Fix Version/s 0.96.2 [ 12325658 ]
        Fix Version/s 0.98.1 [ 12325664 ]
        Fix Version/s 0.99.0 [ 12325675 ]
        Fix Version/s 0.94.17 [ 12325845 ]
        Hide
        Lars Hofhansl added a comment -

        Fixed the Javadoc.
        This is what I would like commit.

        Show
        Lars Hofhansl added a comment - Fixed the Javadoc. This is what I would like commit.
        Lars Hofhansl made changes -
        Attachment 10501-0.94-v4.txt [ 12629281 ]
        Lars Hofhansl made changes -
        Assignee Lars Hofhansl [ lhofhansl ]
        Hide
        Lars Hofhansl added a comment -

        Trunk patch with test fix.

        Show
        Lars Hofhansl added a comment - Trunk patch with test fix.
        Lars Hofhansl made changes -
        Attachment 10501-trunk.txt [ 12629296 ]
        Hide
        Lars Hofhansl added a comment -

        Let's get a test run in.

        Show
        Lars Hofhansl added a comment - Let's get a test run in.
        Lars Hofhansl made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12629296/10501-trunk.txt
        against trunk revision .
        ATTACHMENT ID: 12629296

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629296/10501-trunk.txt against trunk revision . ATTACHMENT ID: 12629296 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//console This message is automatically generated.
        Hide
        Lars Hofhansl added a comment -

        Make findbugs happy (and remove the HFilePerformanceEvaluation part - mixed in from another jira)

        Show
        Lars Hofhansl added a comment - Make findbugs happy (and remove the HFilePerformanceEvaluation part - mixed in from another jira)
        Lars Hofhansl made changes -
        Attachment 10501-trunk-v2.txt [ 12629311 ]
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12629311/10501-trunk-v2.txt
        against trunk revision .
        ATTACHMENT ID: 12629311

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629311/10501-trunk-v2.txt against trunk revision . ATTACHMENT ID: 12629311 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//console This message is automatically generated.
        Hide
        Lars Hofhansl added a comment -

        If there are no objections I am going to commit this soon.

        Show
        Lars Hofhansl added a comment - If there are no objections I am going to commit this soon.
        Hide
        Lars Hofhansl added a comment -

        And to summarize the change in the behavior. 128 memstores, 10g regions:

        no. regions 1 2 3 4 5 6 7 8 9
        w/ patch 128 512 1151 2048 3200 4608 6772 8192 10240
        w/o patch 256 2048 6912 10240 10240 10240 10240 10240 10240

        (sizes in MB)

        So as soon as we see multiple regions of the same table we'd grow pretty quickly, until then we allow two flushes before we split.
        Enis Soztutar, you OK with the last patch? stack, I'd like to get your blessing of the patch first since you came up with the original version.

        Show
        Lars Hofhansl added a comment - And to summarize the change in the behavior. 128 memstores, 10g regions: no. regions 1 2 3 4 5 6 7 8 9 w/ patch 128 512 1151 2048 3200 4608 6772 8192 10240 w/o patch 256 2048 6912 10240 10240 10240 10240 10240 10240 (sizes in MB) So as soon as we see multiple regions of the same table we'd grow pretty quickly, until then we allow two flushes before we split. Enis Soztutar , you OK with the last patch? stack , I'd like to get your blessing of the patch first since you came up with the original version.
        Hide
        stack added a comment -

        OK. Needs fat release note.

        Show
        stack added a comment - OK. Needs fat release note.
        Hide
        stack added a comment -

        What if someone has configuration for when to split already? Will it be overridden or preserved?

        Show
        stack added a comment - What if someone has configuration for when to split already? Will it be overridden or preserved?
        Hide
        Lars Hofhansl added a comment -

        Currently IncreasingToUpperBoundRegionSplitPolicy is not configurable. If using another RegionSplitPolicy it's fine.
        You sound a bit concerned. You'd rather not change it? Or change is too aggressive?

        I think the key is that there is no scenario now that would split earlier than before (because that could lead to a split storm).
        As along as we have a table sparsely distributed (i.e. 0 or 1 regions on a given region server) the behavior would be very similar, we'd split at 256m instead of 128m.
        Just that as soon as we see more regions on a regionserer we should start to assume that the table is already somewhat distributed across the cluster.

        Show
        Lars Hofhansl added a comment - Currently IncreasingToUpperBoundRegionSplitPolicy is not configurable. If using another RegionSplitPolicy it's fine. You sound a bit concerned. You'd rather not change it? Or change is too aggressive? I think the key is that there is no scenario now that would split earlier than before (because that could lead to a split storm). As along as we have a table sparsely distributed (i.e. 0 or 1 regions on a given region server) the behavior would be very similar, we'd split at 256m instead of 128m. Just that as soon as we see more regions on a regionserer we should start to assume that the table is already somewhat distributed across the cluster.
        Hide
        stack added a comment -

        OK. Go for it. Release note it.

        Show
        stack added a comment - OK. Go for it. Release note it.
        Hide
        Lars Hofhansl added a comment -

        How's this for a release note?

        Show
        Lars Hofhansl added a comment - How's this for a release note?
        Lars Hofhansl made changes -
        Release Note Changes the default split policy to avoid too many regions with default settings.
        The old policy calculates the split size at each RS as follows: MIN(maxFileSize, flushSize*NoRegions^2) (NoRegions is the number of region for the table in question seen on this RS)

        The new policy calculates the size this way: MIN(maxFileSize, flushSize*2*NoRegions^3)

        The new policy still allows spreading out the regions over the cluster quickly, but then grows the desired size fairly quickly in order to avoid too many regions per RS.
        Lars Hofhansl made changes -
        Release Note Changes the default split policy to avoid too many regions with default settings.
        The old policy calculates the split size at each RS as follows: MIN(maxFileSize, flushSize*NoRegions^2) (NoRegions is the number of region for the table in question seen on this RS)

        The new policy calculates the size this way: MIN(maxFileSize, flushSize*2*NoRegions^3)

        The new policy still allows spreading out the regions over the cluster quickly, but then grows the desired size fairly quickly in order to avoid too many regions per RS.
        Changes the default split policy to avoid too many regions with default settings.
        The old policy calculates the split size at each RS as follows: MIN(maxFileSize, flushSize*NoRegions^2) (NoRegions is the number of regions for the table in question seen on this RS)

        The new policy calculates the size this way: MIN(maxFileSize, flushSize*2*NoRegions^3)

        The new policy still allows spreading out the regions over the cluster quickly, but then grows the desired size fairly quickly in order to avoid too many regions per RS.
        Hide
        stack added a comment -

        Release note looks good. Add note that initial split size went up from 128M

        Show
        stack added a comment - Release note looks good. Add note that initial split size went up from 128M
        Lars Hofhansl made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Lars Hofhansl added a comment -

        0.94 patch with test fix

        Show
        Lars Hofhansl added a comment - 0.94 patch with test fix
        Lars Hofhansl made changes -
        Attachment 10501-0.94-v5.txt [ 12629602 ]
        Lars Hofhansl made changes -
        Release Note Changes the default split policy to avoid too many regions with default settings.
        The old policy calculates the split size at each RS as follows: MIN(maxFileSize, flushSize*NoRegions^2) (NoRegions is the number of regions for the table in question seen on this RS)

        The new policy calculates the size this way: MIN(maxFileSize, flushSize*2*NoRegions^3)

        The new policy still allows spreading out the regions over the cluster quickly, but then grows the desired size fairly quickly in order to avoid too many regions per RS.
        Changes the default split policy to avoid too many regions with default settings.
        The old policy calculates the split size at each RS as follows: MIN(maxFileSize, flushSize*NoRegions^2) (NoRegions is the number of regions for the table in question seen on this RS)

        The new policy calculates the size this way: MIN(maxFileSize, flushSize*2*NoRegions^3)
        Note that the initial split size is now 2 * the flushSize. With default settings it increased from 128mb to 256mb.

        The new policy still allows spreading out the regions over the cluster quickly, but then grows the desired size fairly quickly in order to avoid too many regions per RS.
        Lars Hofhansl made changes -
        Summary Make IncreasingToUpperBoundRegionSplitPolicy configurable Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions
        Hide
        Lars Hofhansl added a comment -

        Committed to all branches. Thanks for taking a look stack (and Enis Soztutar).

        Show
        Lars Hofhansl added a comment - Committed to all branches. Thanks for taking a look stack (and Enis Soztutar ).
        Lars Hofhansl made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94-on-Hadoop-2 #29 (See https://builds.apache.org/job/HBase-0.94-on-Hadoop-2/29/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569504)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94-on-Hadoop-2 #29 (See https://builds.apache.org/job/HBase-0.94-on-Hadoop-2/29/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569504) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94-JDK7 #58 (See https://builds.apache.org/job/HBase-0.94-JDK7/58/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569504)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94-JDK7 #58 (See https://builds.apache.org/job/HBase-0.94-JDK7/58/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569504) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94 #1294 (See https://builds.apache.org/job/HBase-0.94/1294/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569504)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94 #1294 (See https://builds.apache.org/job/HBase-0.94/1294/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569504) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Hudson added a comment -

        ABORTED: Integrated in HBase-0.94-security #418 (See https://builds.apache.org/job/HBase-0.94-security/418/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569504)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - ABORTED: Integrated in HBase-0.94-security #418 (See https://builds.apache.org/job/HBase-0.94-security/418/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569504) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.98 #165 (See https://builds.apache.org/job/HBase-0.98/165/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569506)

        • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.98 #165 (See https://builds.apache.org/job/HBase-0.98/165/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569506) /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in hbase-0.96 #299 (See https://builds.apache.org/job/hbase-0.96/299/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569505)

        • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - FAILURE: Integrated in hbase-0.96 #299 (See https://builds.apache.org/job/hbase-0.96/299/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569505) /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #154 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/154/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569506)

        • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #154 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/154/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569506) /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-TRUNK #4928 (See https://builds.apache.org/job/HBase-TRUNK/4928/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569507)

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4928 (See https://builds.apache.org/job/HBase-TRUNK/4928/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569507) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in hbase-0.96-hadoop2 #207 (See https://builds.apache.org/job/hbase-0.96-hadoop2/207/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569505)

        • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - FAILURE: Integrated in hbase-0.96-hadoop2 #207 (See https://builds.apache.org/job/hbase-0.96-hadoop2/207/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569505) /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Enis Soztutar added a comment -

        sorry, late +1.

        Show
        Enis Soztutar added a comment - sorry, late +1.
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #93 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/93/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569507)

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #93 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/93/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569507) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Lars Hofhansl made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        5d 7h 22m 1 Lars Hofhansl 17/Feb/14 01:47
        Patch Available Patch Available Open Open
        1d 17h 51m 1 Lars Hofhansl 18/Feb/14 19:38
        Open Open Resolved Resolved
        5m 34s 1 Lars Hofhansl 18/Feb/14 19:44
        Resolved Resolved Closed Closed
        7d 9h 1m 1 Lars Hofhansl 26/Feb/14 04:45

          People

          • Assignee:
            Lars Hofhansl
            Reporter:
            Lars Hofhansl
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development