HBase
  1. HBase
  2. HBASE-10501

Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.96.2, 0.98.1, 0.99.0, 0.94.17
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Changes the default split policy to avoid too many regions with default settings.
      The old policy calculates the split size at each RS as follows: MIN(maxFileSize, flushSize*NoRegions^2) (NoRegions is the number of regions for the table in question seen on this RS)

      The new policy calculates the size this way: MIN(maxFileSize, flushSize*2*NoRegions^3)
      Note that the initial split size is now 2 * the flushSize. With default settings it increased from 128mb to 256mb.

      The new policy still allows spreading out the regions over the cluster quickly, but then grows the desired size fairly quickly in order to avoid too many regions per RS.
      Show
      Changes the default split policy to avoid too many regions with default settings. The old policy calculates the split size at each RS as follows: MIN(maxFileSize, flushSize*NoRegions^2) (NoRegions is the number of regions for the table in question seen on this RS) The new policy calculates the size this way: MIN(maxFileSize, flushSize*2*NoRegions^3) Note that the initial split size is now 2 * the flushSize. With default settings it increased from 128mb to 256mb. The new policy still allows spreading out the regions over the cluster quickly, but then grows the desired size fairly quickly in order to avoid too many regions per RS.

      Description

      During some (admittedly artificial) load testing we found a large amount split activity, which we tracked down the IncreasingToUpperBoundRegionSplitPolicy.

      The current logic is this (from the comments):
      "regions that are on this server that all are of the same table, squared, times the region flush size OR the maximum region split size, whichever is smaller"

      So with a flush size of 128mb and max file size of 20gb, we'd need 13 region of the same table on an RS to reach the max size.
      With 10gb file sized it is still 9 regions of the same table.
      Considering that the number of regions that an RS can carry is limited and there might be multiple tables, this should be more configurable.

      I think the squaring is smart and we do not need to change it.

      We could

      • Make the start size configurable and default it to the flush size
      • Add multiplier for the initial size, i.e. start with n * flushSize
      • Also change the default to start with 2*flush size

      Of course one can override the default split policy, but these seem like simple tweaks.

      Or we could instead set the goal of how many regions of the same table would need to be present in order to reach the max size. In that case we'd start with maxSize/goal^2. So if max size is 20gb and the goal is three we'd start with 20g/9 = 2.2g for the initial region size.

      stack, I'm especially interested in your opinion.

      1. 10501-0.94-v5.txt
        5 kB
        Lars Hofhansl
      2. 10501-trunk-v2.txt
        5 kB
        Lars Hofhansl
      3. 10501-trunk.txt
        7 kB
        Lars Hofhansl
      4. 10501-0.94-v4.txt
        3 kB
        Lars Hofhansl
      5. 10501-0.94-v3.txt
        3 kB
        Lars Hofhansl
      6. 10501-0.94-v2.txt
        3 kB
        Lars Hofhansl
      7. 10501-0.94.txt
        2 kB
        Lars Hofhansl

        Activity

        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #93 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/93/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569507)

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #93 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/93/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569507) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Enis Soztutar added a comment -

        sorry, late +1.

        Show
        Enis Soztutar added a comment - sorry, late +1.
        Hide
        Hudson added a comment -

        FAILURE: Integrated in hbase-0.96-hadoop2 #207 (See https://builds.apache.org/job/hbase-0.96-hadoop2/207/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569505)

        • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - FAILURE: Integrated in hbase-0.96-hadoop2 #207 (See https://builds.apache.org/job/hbase-0.96-hadoop2/207/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569505) /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-TRUNK #4928 (See https://builds.apache.org/job/HBase-TRUNK/4928/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569507)

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-TRUNK #4928 (See https://builds.apache.org/job/HBase-TRUNK/4928/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569507) /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #154 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/154/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569506)

        • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #154 (See https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/154/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569506) /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in hbase-0.96 #299 (See https://builds.apache.org/job/hbase-0.96/299/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569505)

        • /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - FAILURE: Integrated in hbase-0.96 #299 (See https://builds.apache.org/job/hbase-0.96/299/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569505) /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.98 #165 (See https://builds.apache.org/job/HBase-0.98/165/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569506)

        • /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.98 #165 (See https://builds.apache.org/job/HBase-0.98/165/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569506) /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Hudson added a comment -

        ABORTED: Integrated in HBase-0.94-security #418 (See https://builds.apache.org/job/HBase-0.94-security/418/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569504)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - ABORTED: Integrated in HBase-0.94-security #418 (See https://builds.apache.org/job/HBase-0.94-security/418/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569504) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94 #1294 (See https://builds.apache.org/job/HBase-0.94/1294/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569504)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94 #1294 (See https://builds.apache.org/job/HBase-0.94/1294/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569504) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94-JDK7 #58 (See https://builds.apache.org/job/HBase-0.94-JDK7/58/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569504)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94-JDK7 #58 (See https://builds.apache.org/job/HBase-0.94-JDK7/58/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569504) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-0.94-on-Hadoop-2 #29 (See https://builds.apache.org/job/HBase-0.94-on-Hadoop-2/29/)
        HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569504)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-0.94-on-Hadoop-2 #29 (See https://builds.apache.org/job/HBase-0.94-on-Hadoop-2/29/ ) HBASE-10501 Improve IncreasingToUpperBoundRegionSplitPolicy to avoid too many regions. (larsh: rev 1569504) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
        Hide
        Lars Hofhansl added a comment -

        Committed to all branches. Thanks for taking a look stack (and Enis Soztutar).

        Show
        Lars Hofhansl added a comment - Committed to all branches. Thanks for taking a look stack (and Enis Soztutar ).
        Hide
        Lars Hofhansl added a comment -

        0.94 patch with test fix

        Show
        Lars Hofhansl added a comment - 0.94 patch with test fix
        Hide
        stack added a comment -

        Release note looks good. Add note that initial split size went up from 128M

        Show
        stack added a comment - Release note looks good. Add note that initial split size went up from 128M
        Hide
        Lars Hofhansl added a comment -

        How's this for a release note?

        Show
        Lars Hofhansl added a comment - How's this for a release note?
        Hide
        stack added a comment -

        OK. Go for it. Release note it.

        Show
        stack added a comment - OK. Go for it. Release note it.
        Hide
        Lars Hofhansl added a comment -

        Currently IncreasingToUpperBoundRegionSplitPolicy is not configurable. If using another RegionSplitPolicy it's fine.
        You sound a bit concerned. You'd rather not change it? Or change is too aggressive?

        I think the key is that there is no scenario now that would split earlier than before (because that could lead to a split storm).
        As along as we have a table sparsely distributed (i.e. 0 or 1 regions on a given region server) the behavior would be very similar, we'd split at 256m instead of 128m.
        Just that as soon as we see more regions on a regionserer we should start to assume that the table is already somewhat distributed across the cluster.

        Show
        Lars Hofhansl added a comment - Currently IncreasingToUpperBoundRegionSplitPolicy is not configurable. If using another RegionSplitPolicy it's fine. You sound a bit concerned. You'd rather not change it? Or change is too aggressive? I think the key is that there is no scenario now that would split earlier than before (because that could lead to a split storm). As along as we have a table sparsely distributed (i.e. 0 or 1 regions on a given region server) the behavior would be very similar, we'd split at 256m instead of 128m. Just that as soon as we see more regions on a regionserer we should start to assume that the table is already somewhat distributed across the cluster.
        Hide
        stack added a comment -

        What if someone has configuration for when to split already? Will it be overridden or preserved?

        Show
        stack added a comment - What if someone has configuration for when to split already? Will it be overridden or preserved?
        Hide
        stack added a comment -

        OK. Needs fat release note.

        Show
        stack added a comment - OK. Needs fat release note.
        Hide
        Lars Hofhansl added a comment -

        And to summarize the change in the behavior. 128 memstores, 10g regions:

        no. regions 1 2 3 4 5 6 7 8 9
        w/ patch 128 512 1151 2048 3200 4608 6772 8192 10240
        w/o patch 256 2048 6912 10240 10240 10240 10240 10240 10240

        (sizes in MB)

        So as soon as we see multiple regions of the same table we'd grow pretty quickly, until then we allow two flushes before we split.
        Enis Soztutar, you OK with the last patch? stack, I'd like to get your blessing of the patch first since you came up with the original version.

        Show
        Lars Hofhansl added a comment - And to summarize the change in the behavior. 128 memstores, 10g regions: no. regions 1 2 3 4 5 6 7 8 9 w/ patch 128 512 1151 2048 3200 4608 6772 8192 10240 w/o patch 256 2048 6912 10240 10240 10240 10240 10240 10240 (sizes in MB) So as soon as we see multiple regions of the same table we'd grow pretty quickly, until then we allow two flushes before we split. Enis Soztutar , you OK with the last patch? stack , I'd like to get your blessing of the patch first since you came up with the original version.
        Hide
        Lars Hofhansl added a comment -

        If there are no objections I am going to commit this soon.

        Show
        Lars Hofhansl added a comment - If there are no objections I am going to commit this soon.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12629311/10501-trunk-v2.txt
        against trunk revision .
        ATTACHMENT ID: 12629311

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629311/10501-trunk-v2.txt against trunk revision . ATTACHMENT ID: 12629311 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8725//console This message is automatically generated.
        Hide
        Lars Hofhansl added a comment -

        Make findbugs happy (and remove the HFilePerformanceEvaluation part - mixed in from another jira)

        Show
        Lars Hofhansl added a comment - Make findbugs happy (and remove the HFilePerformanceEvaluation part - mixed in from another jira)
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12629296/10501-trunk.txt
        against trunk revision .
        ATTACHMENT ID: 12629296

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop1.1. The patch compiles against the hadoop 1.1 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12629296/10501-trunk.txt against trunk revision . ATTACHMENT ID: 12629296 +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop1.1 . The patch compiles against the hadoop 1.1 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8723//console This message is automatically generated.
        Hide
        Lars Hofhansl added a comment -

        Let's get a test run in.

        Show
        Lars Hofhansl added a comment - Let's get a test run in.
        Hide
        Lars Hofhansl added a comment -

        Trunk patch with test fix.

        Show
        Lars Hofhansl added a comment - Trunk patch with test fix.
        Hide
        Lars Hofhansl added a comment -

        Fixed the Javadoc.
        This is what I would like commit.

        Show
        Lars Hofhansl added a comment - Fixed the Javadoc. This is what I would like commit.
        Hide
        Lars Hofhansl added a comment -

        v3

        • cubes the count
        • starts with 2 x flushSize
        • makes initial size configurable

        Please have a look. Now that the first 0.94.17 RC is sunk I'd like pull this into 0.94.17.

        Show
        Lars Hofhansl added a comment - v3 cubes the count starts with 2 x flushSize makes initial size configurable Please have a look. Now that the first 0.94.17 RC is sunk I'd like pull this into 0.94.17.
        Hide
        Lars Hofhansl added a comment -

        Marking critical as the current default will lead to a bad experience for everyone using HBase the first time and loading a lot of data.

        Show
        Lars Hofhansl added a comment - Marking critical as the current default will lead to a bad experience for everyone using HBase the first time and loading a lot of data.
        Hide
        Enis Soztutar added a comment -

        +1 for changing the default. 2*flushSize + cube looks good.
        I think we should also go for more global split decisions, rather than local decisions. But this is for another issue.

        Show
        Enis Soztutar added a comment - +1 for changing the default. 2*flushSize + cube looks good. I think we should also go for more global split decisions, rather than local decisions. But this is for another issue.
        Hide
        Lars Hofhansl added a comment -

        Cubing could be better indeed. Was thinking not to change too much. But maybe we should.

        So with initialSize = 2*flushSize and cubing we'd get the following by default (128m memstores, 10g regions):
        256m, 2048m, 6912m, 10g
        With squaring we'd get
        256m, 1024m, 2304m, 4096m, 6400m, 9216m, 10g
        With 4*flushsize and squaring it's:
        512m, 2048m, 4608m, 8192m, 10g

        Note sure. Looks like 2*flushSize + cubing is best. When cluster is sparsely used we spread quickly, but also grow quickly once we start seeing multiple regions. Let's do that then?
        As I said this is fuzzy and there is not right or wrong

        Do we have to worry about numerical overflow? We'd blow past 2^63 after a few 1000 regions depending on flush size. Maybe clamp to max file size after 100 regions.

        One bit of information. In our test on a 9 RS/DN cluster we loaded 1bn rows/250gb and ended up with 171 regions. I.e. 1.4g on average and 19 per region server. Definitely not good - and we have 256mb flush size and 10g max file size. Now, 250gb is not exactly a lot of data, but it illustrates the point.
        (It's much higher than our math here; presumably because some of the RS have fewer regions at times, so they split faster even)

        Maybe I can get our perf folks to do some testing.

        Show
        Lars Hofhansl added a comment - Cubing could be better indeed. Was thinking not to change too much. But maybe we should. So with initialSize = 2*flushSize and cubing we'd get the following by default (128m memstores, 10g regions): 256m, 2048m, 6912m, 10g With squaring we'd get 256m, 1024m, 2304m, 4096m, 6400m, 9216m, 10g With 4*flushsize and squaring it's: 512m, 2048m, 4608m, 8192m, 10g Note sure. Looks like 2*flushSize + cubing is best. When cluster is sparsely used we spread quickly, but also grow quickly once we start seeing multiple regions. Let's do that then? As I said this is fuzzy and there is not right or wrong Do we have to worry about numerical overflow? We'd blow past 2^63 after a few 1000 regions depending on flush size. Maybe clamp to max file size after 100 regions. One bit of information. In our test on a 9 RS/DN cluster we loaded 1bn rows/250gb and ended up with 171 regions. I.e. 1.4g on average and 19 per region server. Definitely not good - and we have 256mb flush size and 10g max file size. Now, 250gb is not exactly a lot of data, but it illustrates the point. (It's much higher than our math here; presumably because some of the RS have fewer regions at times, so they split faster even) Maybe I can get our perf folks to do some testing.
        Hide
        stack added a comment -

        lgtm. When you do your little squaring math, its sqrt(50) which is 7? Still too much? cuberoot is between 3 and 4. Would that be better?

        Show
        stack added a comment - lgtm. When you do your little squaring math, its sqrt(50) which is 7? Still too much? cuberoot is between 3 and 4. Would that be better?
        Hide
        Lars Hofhansl added a comment -

        OK, so here's my proposal:

        • By default we set the initial size to 2*flushSize
        • Allow to configure the initial size

        Starting with flush size just makes for a bad experience when folks just play around with loading a bunch of data the first time. It leads to too aggressive splitting. Just adding the factor of two here means that not every flush will result in a split until we see multiple regions per RS.

        Show
        Lars Hofhansl added a comment - OK, so here's my proposal: By default we set the initial size to 2*flushSize Allow to configure the initial size Starting with flush size just makes for a bad experience when folks just play around with loading a bunch of data the first time. It leads to too aggressive splitting. Just adding the factor of two here means that not every flush will result in a split until we see multiple regions per RS.
        Hide
        Lars Hofhansl added a comment -

        Thinking about this a bit more. It is probably fairly unlikely that an RS will see 3 or 4 regions of the same table, unless that table is already spread over the cluster. So I think we should ensure we're pretty close to the max file when we see that many regions.
        In most setups the ration of maxFileSize to flushSize is somewhere around 100:1. (10g regions and 128mb flushsize makes this 80).

        So if we start with 3*flushsize we get an initial 30:1 ratio and thus we'd reach the max at about 5 (sqrt(30) = 5.4...)
        That would also avoid the initial split storm after each flush.

        Should I change the initial size to 3 or 4 x the flush size, instead of making it configurable at all?

        Show
        Lars Hofhansl added a comment - Thinking about this a bit more. It is probably fairly unlikely that an RS will see 3 or 4 regions of the same table, unless that table is already spread over the cluster. So I think we should ensure we're pretty close to the max file when we see that many regions. In most setups the ration of maxFileSize to flushSize is somewhere around 100:1. (10g regions and 128mb flushsize makes this 80). So if we start with 3*flushsize we get an initial 30:1 ratio and thus we'd reach the max at about 5 (sqrt(30) = 5.4...) That would also avoid the initial split storm after each flush. Should I change the initial size to 3 or 4 x the flush size, instead of making it configurable at all?
        Hide
        Lars Hofhansl added a comment -

        Here's a trivial patch allowing to set the initial size.

        Or is simpler to tweak default to 2*flushsize?

        Show
        Lars Hofhansl added a comment - Here's a trivial patch allowing to set the initial size. Or is simpler to tweak default to 2*flushsize?
        Hide
        Lars Hofhansl added a comment - - edited

        Starting with flush size means that every flush will cause a split until we see multiple regions of a table on an RS.
        Maybe just starting with 2*flushsize would avert that.

        Show
        Lars Hofhansl added a comment - - edited Starting with flush size means that every flush will cause a split until we see multiple regions of a table on an RS. Maybe just starting with 2*flushsize would avert that.
        Hide
        Lars Hofhansl added a comment -

        You mean do number of regions ^ 3? Or start with 3*flushsize?

        Flush size might just be too small as initial size, it is set due to memory considerations, whereas the max size is set due to disk IO considerations. Until we start to see multiple regions of a table on an RS we'd split on each flush.

        So many options

        Show
        Lars Hofhansl added a comment - You mean do number of regions ^ 3? Or start with 3*flushsize? Flush size might just be too small as initial size, it is set due to memory considerations, whereas the max size is set due to disk IO considerations. Until we start to see multiple regions of a table on an RS we'd split on each flush. So many options
        Hide
        stack added a comment -

        The original motivation was split quickly, initially, so regions got farmed out across the cluster sooner so full cluster got in on the action faster than if say, you had to wait till one region hit a max size.

        Was also hoping to keep the logic simple – easy to understand – and was trying to make it so you'd not have to touch this splitter going forward.

        As you point out 13 splits to reach max seems too many especially if many tables on the one server. Suggestions for when to split the first time all seem fine – just pick one I'd say.

        What if we tripled rather than doubled?

        Show
        stack added a comment - The original motivation was split quickly, initially, so regions got farmed out across the cluster sooner so full cluster got in on the action faster than if say, you had to wait till one region hit a max size. Was also hoping to keep the logic simple – easy to understand – and was trying to make it so you'd not have to touch this splitter going forward. As you point out 13 splits to reach max seems too many especially if many tables on the one server. Suggestions for when to split the first time all seem fine – just pick one I'd say. What if we tripled rather than doubled?

          People

          • Assignee:
            Lars Hofhansl
            Reporter:
            Lars Hofhansl
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development