HBase
  1. HBase
  2. HBASE-9763

Scan javadoc doesn't fully capture semantics of start and stop row

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: documentation
    • Labels:
      None

      Description

      The current javadoc for Scan#setStartRow and Scan#setStopRow methods don't accurately capture the semantics of the use of row prefix values. Both methods describe the use of a trailing null byte to change the inclusive/exclusive the respective semantics of setStartRow and setStopRow.

      The use of a trailing null byte for start row exclusion only works in the case that exact full matching is done on row keys. The use of a trailing null byte for stop row inclusion has even more limitations (see HBASE-9035).

      The basic example is having the following rows:

      AAB
      ABB
      BBC
      BCC
      

      Setting the start row to A and the stop row to B will include AAB and AB.

      Setting the start row to A\x0 and the stop row to B\x0 will result in the same two rows coming out of the scan, instead of having an effect on the inclusion/exclusion semantics.

        Activity

        Hide
        Enis Soztutar added a comment -

        I think we can do a Scan.setStartRowInclusive() and Scan.setStopRowInclusive() so that we won't need byte appending shenanigans

        Show
        Enis Soztutar added a comment - I think we can do a Scan.setStartRowInclusive() and Scan.setStopRowInclusive() so that we won't need byte appending shenanigans
        Hide
        Gabriel Reid added a comment -

        Thanks for taking a look. The difference between your test and my example is that my example is based on a row key prefix, and your test is based on full matching. Like the discussion and previous bug in the docs in HBASE-9035, there's a lot of potential for confusion when appending null bytes to get different semantics.

        I think this is a potentially confusing topic for users not matter what. Another option instead of removing the null-byte appending advice from the javadoc might be to qualify it with more information around the semantics of what the start row and end row of a scan really are. Or maybe I'm just worrying too much about it and it's not really that big of an issue.

        Show
        Gabriel Reid added a comment - Thanks for taking a look. The difference between your test and my example is that my example is based on a row key prefix, and your test is based on full matching. Like the discussion and previous bug in the docs in HBASE-9035 , there's a lot of potential for confusion when appending null bytes to get different semantics. I think this is a potentially confusing topic for users not matter what. Another option instead of removing the null-byte appending advice from the javadoc might be to qualify it with more information around the semantics of what the start row and end row of a scan really are. Or maybe I'm just worrying too much about it and it's not really that big of an issue.
        Hide
        Nick Dimiduk added a comment -

        I've also seen code (in OpenTSDB, iirc) which increments the startrow value when exclusion semantics are desired. This is a little messy to do by hand as byte overflow needs to be accounted for.

        Show
        Nick Dimiduk added a comment - I've also seen code (in OpenTSDB, iirc) which increments the startrow value when exclusion semantics are desired. This is a little messy to do by hand as byte overflow needs to be accounted for.
        Hide
        Nick Dimiduk added a comment -

        Perhaps I'm misunderstanding your example, but the trailing null-byte advice works as documented:

        hbase(main):010:0> scan 't1', {STARTROW=>'aa'}
        ROW                                                 COLUMN+CELL                                                                                                                                           
         aa                                                 column=f:, timestamp=1381942385262, value=aa                                                                                                          
         ab                                                 column=f:, timestamp=1381942391424, value=ab                                                                                                          
         ac                                                 column=f:, timestamp=1381942396077, value=ac                                                                                                          
         ad                                                 column=f:, timestamp=1381942400858, value=ad                                                                                                          
         ae                                                 column=f:, timestamp=1381942405261, value=ae                                                                                                          
         af                                                 column=f:, timestamp=1381942409758, value=af                                                                                                          
        6 row(s) in 0.0870 seconds
        
        hbase(main):011:0> scan 't1', {STARTROW=>'aa', STOPROW=>'ae'}
        ROW                                                 COLUMN+CELL                                                                                                                                           
         aa                                                 column=f:, timestamp=1381942385262, value=aa                                                                                                          
         ab                                                 column=f:, timestamp=1381942391424, value=ab                                                                                                          
         ac                                                 column=f:, timestamp=1381942396077, value=ac                                                                                                          
         ad                                                 column=f:, timestamp=1381942400858, value=ad                                                                                                          
        4 row(s) in 0.0510 seconds
        
        hbase(main):012:0> scan 't1', {STARTROW=>'aa\x00', STOPROW=>'ae'}
        ROW                                                 COLUMN+CELL                                                                                                                                           
         ab                                                 column=f:, timestamp=1381942391424, value=ab                                                                                                          
         ac                                                 column=f:, timestamp=1381942396077, value=ac                                                                                                          
         ad                                                 column=f:, timestamp=1381942400858, value=ad                                                                                                          
        3 row(s) in 0.0350 seconds
        
        hbase(main):013:0> scan 't1', {STARTROW=>'aa\x00', STOPROW=>'ae\x00'}
        ROW                                                 COLUMN+CELL                                                                                                                                           
         ab                                                 column=f:, timestamp=1381942391424, value=ab                                                                                                          
         ac                                                 column=f:, timestamp=1381942396077, value=ac                                                                                                          
         ad                                                 column=f:, timestamp=1381942400858, value=ad                                                                                                          
         ae                                                 column=f:, timestamp=1381942405261, value=ae                                                                                                          
        4 row(s) in 0.0560 seconds
        

        When the precise rowkey values are unknown, I recommend using the PrefixFilter and letting it sort things out.

        Show
        Nick Dimiduk added a comment - Perhaps I'm misunderstanding your example, but the trailing null-byte advice works as documented: hbase(main):010:0> scan 't1', {STARTROW=>'aa'} ROW COLUMN+CELL aa column=f:, timestamp=1381942385262, value=aa ab column=f:, timestamp=1381942391424, value=ab ac column=f:, timestamp=1381942396077, value=ac ad column=f:, timestamp=1381942400858, value=ad ae column=f:, timestamp=1381942405261, value=ae af column=f:, timestamp=1381942409758, value=af 6 row(s) in 0.0870 seconds hbase(main):011:0> scan 't1', {STARTROW=>'aa', STOPROW=>'ae'} ROW COLUMN+CELL aa column=f:, timestamp=1381942385262, value=aa ab column=f:, timestamp=1381942391424, value=ab ac column=f:, timestamp=1381942396077, value=ac ad column=f:, timestamp=1381942400858, value=ad 4 row(s) in 0.0510 seconds hbase(main):012:0> scan 't1', {STARTROW=>'aa\x00', STOPROW=>'ae'} ROW COLUMN+CELL ab column=f:, timestamp=1381942391424, value=ab ac column=f:, timestamp=1381942396077, value=ac ad column=f:, timestamp=1381942400858, value=ad 3 row(s) in 0.0350 seconds hbase(main):013:0> scan 't1', {STARTROW=>'aa\x00', STOPROW=>'ae\x00'} ROW COLUMN+CELL ab column=f:, timestamp=1381942391424, value=ab ac column=f:, timestamp=1381942396077, value=ac ad column=f:, timestamp=1381942400858, value=ad ae column=f:, timestamp=1381942405261, value=ae 4 row(s) in 0.0560 seconds When the precise rowkey values are unknown, I recommend using the PrefixFilter and letting it sort things out.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12608453/HBASE-9763.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        -1 site. The patch appears to cause mvn site goal to fail.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.regionserver.wal.TestLogRolling

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12608453/HBASE-9763.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +0 tests included . The patch appears to be a documentation patch that doesn't require tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.wal.TestLogRolling Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7544//console This message is automatically generated.
        Hide
        Gabriel Reid added a comment -

        Trivial patch to remove the trailing null byte information, and be a bit more clear about the use of scan start and stop as matching the row prefix.

        Show
        Gabriel Reid added a comment - Trivial patch to remove the trailing null byte information, and be a bit more clear about the use of scan start and stop as matching the row prefix.

          People

          • Assignee:
            Gabriel Reid
            Reporter:
            Gabriel Reid
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development