HBase
  1. HBase
  2. HBASE-7748

Add DelimitedKeyPrefixRegionSplitPolicy

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.94.6, 0.95.0
    • Fix Version/s: 0.94.5, 0.95.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      DelimitedKeyPrefixRegionSplitPolicy similar to KeyPrefixRegionSplitPolicy, but with a delimiter for the key, instead of a fixed prefix.

      Can be used for META regions, since we are doing table_name,start_key,region_id.encoded_region_name.

      1. hbase-7748_v3-0.94.patch
        8 kB
        Enis Soztutar
      2. hbase-7748_v3.patch
        8 kB
        Enis Soztutar
      3. hbase-7748_v2.patch
        8 kB
        Enis Soztutar
      4. hbase-7748_v1.patch
        6 kB
        Enis Soztutar

        Issue Links

          Activity

          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security-on-Hadoop-23 #12 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/12/)
          HBASE-7748. Add DelimitedKeyPrefixRegionSplitPolicy (Revision 1442803)

          Result = FAILURE
          enis :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security-on-Hadoop-23 #12 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/12/ ) HBASE-7748 . Add DelimitedKeyPrefixRegionSplitPolicy (Revision 1442803) Result = FAILURE enis : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
          Hide
          Robert Dyer added a comment -

          BTW all, I filed HBASE-7877 to fix this inefficiency.

          Show
          Robert Dyer added a comment - BTW all, I filed HBASE-7877 to fix this inefficiency.
          Hide
          Robert Dyer added a comment -

          @Enis, perhaps a change in the data model would avoid this situation. However to me, regardless of the data model, it appears that this behaviour is non-optimal.

          We select a split point (roughly the middle) and then arbitrarily move it one direction (to find a group boundary). The original split point is the most optimal, in terms of splitting. Thus, we should find the nearest usable split point to that row and maintain as optimal a split as possible.

          Sure in the example I gave it is an extreme case, but even ignoring that you might end up with non-optimal splits. It may be the case that moving down 1 single row would find a group boundary, yet we move up back rows anyway.

          Show
          Robert Dyer added a comment - @Enis, perhaps a change in the data model would avoid this situation. However to me, regardless of the data model, it appears that this behaviour is non-optimal. We select a split point (roughly the middle) and then arbitrarily move it one direction (to find a group boundary). The original split point is the most optimal, in terms of splitting. Thus, we should find the nearest usable split point to that row and maintain as optimal a split as possible. Sure in the example I gave it is an extreme case, but even ignoring that you might end up with non-optimal splits. It may be the case that moving down 1 single row would find a group boundary, yet we move up back rows anyway.
          Hide
          Enis Soztutar added a comment -

          @Robert, that looks like a corner case, which should be handled by changing the data model, rather than split policy, no?

          Show
          Enis Soztutar added a comment - @Robert, that looks like a corner case, which should be handled by changing the data model, rather than split policy, no?
          Hide
          Ted Yu added a comment -

          @Robert:
          Good observation.

          Show
          Ted Yu added a comment - @Robert: Good observation.
          Hide
          Robert Dyer added a comment -

          Am I correct in assuming that if there is a very uneven distribution of user's, and the region has say 5 users, but the first user in the region takes up over half the space (and thus the split point picks that user) that this implementation will wind up not splitting?

          Wouldn't it make sense to find the row group's first row (what the current patch does) and find the next group's first row and choose based on which one gives a better split?

          Show
          Robert Dyer added a comment - Am I correct in assuming that if there is a very uneven distribution of user's, and the region has say 5 users, but the first user in the region takes up over half the space (and thus the split point picks that user) that this implementation will wind up not splitting? Wouldn't it make sense to find the row group's first row (what the current patch does) and find the next group's first row and choose based on which one gives a better split?
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #107 (See https://builds.apache.org/job/HBase-0.94-security/107/)
          HBASE-7748. Add DelimitedKeyPrefixRegionSplitPolicy (Revision 1442803)

          Result = FAILURE
          enis :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #107 (See https://builds.apache.org/job/HBase-0.94-security/107/ ) HBASE-7748 . Add DelimitedKeyPrefixRegionSplitPolicy (Revision 1442803) Result = FAILURE enis : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #824 (See https://builds.apache.org/job/HBase-0.94/824/)
          HBASE-7748. Add DelimitedKeyPrefixRegionSplitPolicy (Revision 1442803)

          Result = SUCCESS
          enis :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #824 (See https://builds.apache.org/job/HBase-0.94/824/ ) HBASE-7748 . Add DelimitedKeyPrefixRegionSplitPolicy (Revision 1442803) Result = SUCCESS enis : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
          Hide
          Enis Soztutar added a comment -

          Committed this to 0.94 branch. Attaching straightforward port.

          Show
          Enis Soztutar added a comment - Committed this to 0.94 branch. Attaching straightforward port.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #391 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/391/)
          HBASE-7748. Add DelimitedKeyPrefixRegionSplitPolicy (Revision 1442408)

          Result = FAILURE
          enis :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #391 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/391/ ) HBASE-7748 . Add DelimitedKeyPrefixRegionSplitPolicy (Revision 1442408) Result = FAILURE enis : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #3849 (See https://builds.apache.org/job/HBase-TRUNK/3849/)
          HBASE-7748. Add DelimitedKeyPrefixRegionSplitPolicy (Revision 1442408)

          Result = FAILURE
          enis :
          Files :

          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #3849 (See https://builds.apache.org/job/HBase-TRUNK/3849/ ) HBASE-7748 . Add DelimitedKeyPrefixRegionSplitPolicy (Revision 1442408) Result = FAILURE enis : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java
          Hide
          Enis Soztutar added a comment -

          Attaching committed version, v3.

          Show
          Enis Soztutar added a comment - Attaching committed version, v3.
          Hide
          Enis Soztutar added a comment -

          Committed this with Ted's suggestions. Will also commit this in 0.94.6 unless objection.

          Show
          Enis Soztutar added a comment - Committed this with Ted's suggestions. Will also commit this in 0.94.6 unless objection.
          Hide
          stack added a comment -

          +1 on patch (edit javadoc on commit)

          Show
          stack added a comment - +1 on patch (edit javadoc on commit)
          Hide
          Ted Yu added a comment -
          + * I.e. rows can be co-located in a regionb by their prefix.
          

          Typo: regionb

          + * As and example, if you have row keys delimited with <code>_</code>, like
          

          Typo: and

          + * ensures that all rows staring with the same userid, belongs to the same region.
          

          Typo: staring, belongs

          Please add stability annotation.

          +      //find the first occurrence of delimiter in split point
          

          The above comment should be explicitly mentioned in class javadoc where delimiter appears twice in rowkey:

          + * <code>userid_eventtype_eventid</code>, and use prefix delimiter _, this split policy
          
          Show
          Ted Yu added a comment - + * I.e. rows can be co-located in a regionb by their prefix. Typo: regionb + * As and example, if you have row keys delimited with <code>_</code>, like Typo: and + * ensures that all rows staring with the same userid, belongs to the same region. Typo: staring, belongs Please add stability annotation. + //find the first occurrence of delimiter in split point The above comment should be explicitly mentioned in class javadoc where delimiter appears twice in rowkey: + * <code>userid_eventtype_eventid</code>, and use prefix delimiter _, this split policy
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12567704/hbase-7748_v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces lines longer than 100

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestSplitLogWorker

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4301//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4301//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12567704/hbase-7748_v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces lines longer than 100 -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestSplitLogWorker Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4301//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4301//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4301//console This message is automatically generated.
          Hide
          Lars Hofhansl added a comment -

          +1 on patch
          +1 on idea
          +1 on using for atomic META transactions

          Show
          Lars Hofhansl added a comment - +1 on patch +1 on idea +1 on using for atomic META transactions
          Hide
          Enis Soztutar added a comment -

          Patch v2. Change the configuration key for KeyPrefixRegionSplitPolicy

          Show
          Enis Soztutar added a comment - Patch v2. Change the configuration key for KeyPrefixRegionSplitPolicy
          Hide
          Enis Soztutar added a comment -

          Attaching patch.

          Show
          Enis Soztutar added a comment - Attaching patch.

            People

            • Assignee:
              Enis Soztutar
              Reporter:
              Enis Soztutar
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development