HBase
  1. HBase
  2. HBASE-9079

FilterList getNextKeyHint skips rows that should be included in the results

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.94.10
    • Fix Version/s: 0.98.0, 0.95.2, 0.94.11
    • Component/s: Filters
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I hit a weird issue/bug and am able to reproduce the error consistently. The problem arises when FilterList has two filters where each implements the getNextKeyHint method.

      The way the current implementation works is, StoreScanner will call matcher.getNextKeyHint() whenever it gets a SEEK_NEXT_USING_HINT. This in turn will call filter.getNextKeyHint() which at this stage is of type FilterList. The implementation in FilterList iterates through all the filters and keeps the max KeyValue that it sees. All is fine if you wrap filters in FilterList in which only one of them implements getNextKeyHint. but if multiple of them implement then that's where things get weird.

      For example:

      • create two filters: one is FuzzyRowFilter and second is ColumnRangeFilter. Both of them implement getNextKeyHint
      • wrap them in FilterList with MUST_PASS_ALL
      • FuzzyRowFilter will seek to the correct first row and then pass it to ColumnRangeFilter which will return the SEEK_NEXT_USING_HINT code.
      • Now in FilterList when getNextKeyHint is called, it calls the one on FuzzyRow first which basically says what the next row should be. While in reality we want the ColumnRangeFilter to give the seek hint.
      • The above behavior skips data that should be returned, which I have verified by using a RowFilter with RegexStringComparator.

      I updated the FilterList to maintain state on which filter returns the SEEK_NEXT_USING_HINT and in getNextKeyHint, I invoke the method on the saved filter and reset that state. I tested it with my current queries and it works fine but I need to run the entire test suite to make sure I have not introduced any regression. In addition to that I need to figure out what should be the behavior when the opeation is MUST_PASS_ONE, but I doubt it should be any different.

      Is my understanding of it being a bug correct ? Or am I trivializing it and ignoring something very important ? If it's tough to wrap your head around the explanation, then I can open a JIRA and upload a patch against 0.94 head.

      1. 9079-0.94-v2.txt
        12 kB
        Lars Hofhansl
      2. 9079-trunk-v2.txt
        12 kB
        Lars Hofhansl
      3. HBASE-9079-0.94.patch
        8 kB
        Viral Bajaria
      4. HBASE-9079-trunk.patch
        8 kB
        Viral Bajaria

        Activity

        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #658 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/658/)
        HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512018)

        • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #658 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/658/ ) HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512018) /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in hbase-0.95-on-hadoop2 #225 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/225/)
        HBASE-9079 Missed new test (larsh: rev 1512020)

        • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java
          HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512019)
        • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
        • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java
        Show
        Hudson added a comment - FAILURE: Integrated in hbase-0.95-on-hadoop2 #225 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/225/ ) HBASE-9079 Missed new test (larsh: rev 1512020) /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512019) /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in hbase-0.95 #418 (See https://builds.apache.org/job/hbase-0.95/418/)
        HBASE-9079 Missed new test (larsh: rev 1512020)

        • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java
          HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512019)
        • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
        • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java
        Show
        Hudson added a comment - FAILURE: Integrated in hbase-0.95 #418 (See https://builds.apache.org/job/hbase-0.95/418/ ) HBASE-9079 Missed new test (larsh: rev 1512020) /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512019) /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java
        Hide
        Hudson added a comment -

        FAILURE: Integrated in HBase-TRUNK #4357 (See https://builds.apache.org/job/HBase-TRUNK/4357/)
        HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512018)

        • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java
        Show
        Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #4357 (See https://builds.apache.org/job/HBase-TRUNK/4357/ ) HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512018) /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.94 #1100 (See https://builds.apache.org/job/HBase-0.94/1100/)
        HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512021)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.94 #1100 (See https://builds.apache.org/job/HBase-0.94/1100/ ) HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512021) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java
        Hide
        Hudson added a comment -

        SUCCESS: Integrated in HBase-0.94-security #249 (See https://builds.apache.org/job/HBase-0.94-security/249/)
        HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512021)

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java
        • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java
        Show
        Hudson added a comment - SUCCESS: Integrated in HBase-0.94-security #249 (See https://builds.apache.org/job/HBase-0.94-security/249/ ) HBASE-9079 FilterList getNextKeyHint skips rows that should be included in the results (Viral Bajaria and LarsH) (larsh: rev 1512021) /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestFilterList.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestFuzzyRowAndColumnRangeFilter.java
        Hide
        Viral Bajaria added a comment -

        Thanks for all the help Ted Yu and Lars Hofhansl to get this patch cleaned up and integrated! Look forward to contributing more. You guys rock!

        Show
        Viral Bajaria added a comment - Thanks for all the help Ted Yu and Lars Hofhansl to get this patch cleaned up and integrated! Look forward to contributing more. You guys rock!
        Hide
        Lars Hofhansl added a comment -

        Committed to 0.94, 0.95, and trunk.
        Thanks for the patch, Viral.

        Show
        Lars Hofhansl added a comment - Committed to 0.94, 0.95, and trunk. Thanks for the patch, Viral.
        Hide
        Viral Bajaria added a comment -

        Sorry for the delay. I did a local test with production data and it looks fine.

        Show
        Viral Bajaria added a comment - Sorry for the delay. I did a local test with production data and it looks fine.
        Hide
        Lars Hofhansl added a comment -

        <ping>

        It's also fine to push into next month's 0.94.12.

        Show
        Lars Hofhansl added a comment - <ping> It's also fine to push into next month's 0.94.12.
        Hide
        Viral Bajaria added a comment -

        Not yet, got caught in something else. Will get to it before EOD (PST) for sure.

        Show
        Viral Bajaria added a comment - Not yet, got caught in something else. Will get to it before EOD (PST) for sure.
        Hide
        Lars Hofhansl added a comment -

        Did you get a chance to test this with real data, Viral Bajaria?

        Show
        Lars Hofhansl added a comment - Did you get a chance to test this with real data, Viral Bajaria ?
        Hide
        Lars Hofhansl added a comment -

        Thanks Viral. I added you as HBase contributor and credited this issue to you.

        Show
        Lars Hofhansl added a comment - Thanks Viral. I added you as HBase contributor and credited this issue to you.
        Hide
        Viral Bajaria added a comment -

        Looks good to me. I have applied the patch to my local repo and will test with real data in a bit. Will provide an update after that (hopefully before tomorrow).

        Show
        Viral Bajaria added a comment - Looks good to me. I have applied the patch to my local repo and will test with real data in a bit. Will provide an update after that (hopefully before tomorrow).
        Hide
        Lars Hofhansl added a comment -

        Will move the comment upon commit. So Ted and Viral, you both good with this?

        Show
        Lars Hofhansl added a comment - Will move the comment upon commit. So Ted and Viral, you both good with this?
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12596477/9079-trunk-v2.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 6 new or modified tests.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12596477/9079-trunk-v2.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6623//console This message is automatically generated.
        Hide
        Ted Yu added a comment -

        I ran Filter related tests and they passed.

             for (Filter filter : filters) {
               KeyValue curKeyHint = filter.getNextKeyHint(currentKV);
        -      if (curKeyHint == null && operator == Operator.MUST_PASS_ONE) {
        +      if (curKeyHint == null) {
                 // If we ever don't have a hint and this is must-pass-one, then no hint
        

        nit: maybe lift the comment about must-pass-one before the for loop.

        Show
        Ted Yu added a comment - I ran Filter related tests and they passed. for (Filter filter : filters) { KeyValue curKeyHint = filter.getNextKeyHint(currentKV); - if (curKeyHint == null && operator == Operator.MUST_PASS_ONE) { + if (curKeyHint == null ) { // If we ever don't have a hint and this is must-pass-one, then no hint nit: maybe lift the comment about must-pass-one before the for loop.
        Hide
        Lars Hofhansl added a comment -

        And for trunk. Please have a look.

        Show
        Lars Hofhansl added a comment - And for trunk. Please have a look.
        Hide
        Lars Hofhansl added a comment -

        Oops. Here's the right file.

        Show
        Lars Hofhansl added a comment - Oops. Here's the right file.
        Hide
        Lars Hofhansl added a comment -

        How about this patch?
        Removes unnecessary code from FilterList and fixes TestFilterList to exercise new logic correctly.
        Please have a look.

        Show
        Lars Hofhansl added a comment - How about this patch? Removes unnecessary code from FilterList and fixes TestFilterList to exercise new logic correctly. Please have a look.
        Hide
        Lars Hofhansl added a comment -

        Then again this breaks TestFilterList.testHintPassThru. But the breaking part should be removed as it tests that MUST_PASS_ALL returns the larger of the key hints, which is no longer the case.

        Show
        Lars Hofhansl added a comment - Then again this breaks TestFilterList.testHintPassThru. But the breaking part should be removed as it tests that MUST_PASS_ALL returns the larger of the key hints, which is no longer the case.
        Hide
        Lars Hofhansl added a comment -

        From your example the problem is (1) "FuzzyRow includes the filter and says move on to ColumnRange" paired with (2) "FuzzyRow returns the next row that we should use".
        Even though the FuzzyRowInclude said we should include the row the call to getNextKeyHint() returns a non-null KV.

        So from that angle we should only consult the filters that we actually called, which your patch does correctly.

        Now, what if FuzzyRowFilter had been the one that returned SEEK_NEXT_USING_HINT. Then that filter would be the one to provide the getNextKeyHint as well, and might skip right past the KV you want for the ColumnRangeFilter. You are saying in that case FuzzyRowFilter did not INCLUDE it, and thus it would be correct use its getNextKeyHint?

        And because we're short-circuiting the and when we encounter SEEK_NEXT_USING_HINT, we can safely jump to that KV. OK. Seems that is correct. Just making sure...

        In that case I only have one further comment:
        getNextKeyHint on FilterList is only called when SEEK_NEXT_USING_HINT is returned. If this FilterList is MUST_PASS_ALL the seekHintFilter must not be null, correct? So we could simplify like this:

        @@ -332,9 +337,15 @@
           @Override
           public KeyValue getNextKeyHint(KeyValue currentKV) {
             KeyValue keyHint = null;
        +    if (operator == Operator.MUST_PASS_ALL) {
        +      keyHint = seekHintFilter.getNextKeyHint(currentKV);
        +      return keyHint;
        +    }
        +
             for (Filter filter : filters) {
               KeyValue curKeyHint = filter.getNextKeyHint(currentKV);
        -      if (curKeyHint == null && operator == Operator.MUST_PASS_ONE) {
        +      if (curKeyHint == null) {
                 // If we ever don't have a hint and this is must-pass-one, then no hint
                 return null;
               }
        @@ -344,13 +355,7 @@
                   keyHint = curKeyHint;
                   continue;
                 }
        -        // There is an existing hint
        -        if (operator == Operator.MUST_PASS_ALL &&
        -            KeyValue.COMPARATOR.compare(keyHint, curKeyHint) < 0) {
        -          // If all conditions must pass, we can keep the max hint
        -          keyHint = curKeyHint;
        -        } else if (operator == Operator.MUST_PASS_ONE &&
        -            KeyValue.COMPARATOR.compare(keyHint, curKeyHint) > 0) {
        +        if (KeyValue.COMPARATOR.compare(keyHint, curKeyHint) > 0) {
                   // If any condition can pass, we need to keep the min hint
                   keyHint = curKeyHint;
                 }
        

        And then also reset the seekHintFilter in the reset() method.

        Show
        Lars Hofhansl added a comment - From your example the problem is (1) "FuzzyRow includes the filter and says move on to ColumnRange" paired with (2) "FuzzyRow returns the next row that we should use". Even though the FuzzyRowInclude said we should include the row the call to getNextKeyHint() returns a non-null KV. So from that angle we should only consult the filters that we actually called, which your patch does correctly. Now, what if FuzzyRowFilter had been the one that returned SEEK_NEXT_USING_HINT. Then that filter would be the one to provide the getNextKeyHint as well, and might skip right past the KV you want for the ColumnRangeFilter. You are saying in that case FuzzyRowFilter did not INCLUDE it, and thus it would be correct use its getNextKeyHint? And because we're short-circuiting the and when we encounter SEEK_NEXT_USING_HINT, we can safely jump to that KV. OK. Seems that is correct. Just making sure... In that case I only have one further comment: getNextKeyHint on FilterList is only called when SEEK_NEXT_USING_HINT is returned. If this FilterList is MUST_PASS_ALL the seekHintFilter must not be null, correct? So we could simplify like this: @@ -332,9 +337,15 @@ @Override public KeyValue getNextKeyHint(KeyValue currentKV) { KeyValue keyHint = null ; + if ( operator == Operator.MUST_PASS_ALL) { + keyHint = seekHintFilter.getNextKeyHint(currentKV); + return keyHint; + } + for (Filter filter : filters) { KeyValue curKeyHint = filter.getNextKeyHint(currentKV); - if (curKeyHint == null && operator == Operator.MUST_PASS_ONE) { + if (curKeyHint == null ) { // If we ever don't have a hint and this is must-pass-one, then no hint return null ; } @@ -344,13 +355,7 @@ keyHint = curKeyHint; continue ; } - // There is an existing hint - if ( operator == Operator.MUST_PASS_ALL && - KeyValue.COMPARATOR.compare(keyHint, curKeyHint) < 0) { - // If all conditions must pass, we can keep the max hint - keyHint = curKeyHint; - } else if ( operator == Operator.MUST_PASS_ONE && - KeyValue.COMPARATOR.compare(keyHint, curKeyHint) > 0) { + if (KeyValue.COMPARATOR.compare(keyHint, curKeyHint) > 0) { // If any condition can pass, we need to keep the min hint keyHint = curKeyHint; } And then also reset the seekHintFilter in the reset() method.
        Hide
        Viral Bajaria added a comment -

        The current patch only calls the filter that gave the SEEK_NEXT_USING_HINT, we don't go through all the filters in the FilterList if the operator is MUST_PASS_ALL.

        For MUST_PASS_ONE, the logic is to select the minimum of the hints and thus we will not skip the rows/columns even if one of the filters suggests to jump over since we are going to keep the minimum.

        Show
        Viral Bajaria added a comment - The current patch only calls the filter that gave the SEEK_NEXT_USING_HINT, we don't go through all the filters in the FilterList if the operator is MUST_PASS_ALL. For MUST_PASS_ONE, the logic is to select the minimum of the hints and thus we will not skip the rows/columns even if one of the filters suggests to jump over since we are going to keep the minimum.
        Hide
        Lars Hofhansl added a comment -

        I see.

        But what you hit a scenario where the FuzzyRowFilter would also return SEEK_NEXT_USING_HINT and it happens to be first in the filter list? In that case you'd still seek past columns that should be included. So the problem is that SEEK_NEXT_USING_HINT is not transitive to following columns/row.

        It seems the only 100% safe way to do this in a FilterList is treat any seek optimization as a simple SKIP.

        Show
        Lars Hofhansl added a comment - I see. But what you hit a scenario where the FuzzyRowFilter would also return SEEK_NEXT_USING_HINT and it happens to be first in the filter list? In that case you'd still seek past columns that should be included. So the problem is that SEEK_NEXT_USING_HINT is not transitive to following columns/row. It seems the only 100% safe way to do this in a FilterList is treat any seek optimization as a simple SKIP.
        Hide
        Viral Bajaria added a comment -

        If we take the largest seekHint, the problem can be explained as follows:

        • FilterList with FuzzyRow + ColumnRange with MUST_PASS_ALL.
        • FuzzyRow includes the filter and says move on to ColumnRange.
        • ColumnRange says first column is not a match and I can give you a better seekHint
        • FilterList calls seekHint on both FuzzyRow and ColumnRange. FuzzyRow returns the next row that we should use while ColumnRange returns the next column from the originally selected row. If we keep max here then we move on to the next row and do no return the columns from the row that should have been returned. The test case proves that this is what happened originally (though I removed the TestFail.patch due to some Hadoop QA issues)

        Yes the current changes work even if you change the ordering of filters. The test in the patch verified that behavior too.

        Show
        Viral Bajaria added a comment - If we take the largest seekHint, the problem can be explained as follows: FilterList with FuzzyRow + ColumnRange with MUST_PASS_ALL. FuzzyRow includes the filter and says move on to ColumnRange. ColumnRange says first column is not a match and I can give you a better seekHint FilterList calls seekHint on both FuzzyRow and ColumnRange. FuzzyRow returns the next row that we should use while ColumnRange returns the next column from the originally selected row. If we keep max here then we move on to the next row and do no return the columns from the row that should have been returned. The test case proves that this is what happened originally (though I removed the TestFail.patch due to some Hadoop QA issues) Yes the current changes work even if you change the ordering of filters. The test in the patch verified that behavior too.
        Hide
        Lars Hofhansl added a comment - - edited

        So thinking about this again. Why can't we take the largest of any seekHint when we and all the filters together in a FilterList?
        In your case if you add the filters to the list in a different order, will your patch still work?

        Is the actual problem here that a Filter returns a KV from getNextKeyHint even when it would not have returned SEEK_NEXT_USING_HINT?

        Show
        Lars Hofhansl added a comment - - edited So thinking about this again. Why can't we take the largest of any seekHint when we and all the filters together in a FilterList? In your case if you add the filters to the list in a different order, will your patch still work? Is the actual problem here that a Filter returns a KV from getNextKeyHint even when it would not have returned SEEK_NEXT_USING_HINT?
        Hide
        Viral Bajaria added a comment -

        Lars Hofhansl Can you review the patch when you get a chance ? I have already deployed this to my production cluster and have not had any issues.

        Show
        Viral Bajaria added a comment - Lars Hofhansl Can you review the patch when you get a chance ? I have already deployed this to my production cluster and have not had any issues.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12595522/HBASE-9079-trunk.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 hadoop1.0. The patch compiles against the hadoop 1.0 profile.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 lineLengths. The patch does not introduce lines longer than 100

        +1 site. The mvn site goal succeeds with this patch.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595522/HBASE-9079-trunk.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop1.0 . The patch compiles against the hadoop 1.0 profile. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6562//console This message is automatically generated.
        Hide
        Viral Bajaria added a comment -

        reattaching trunk patch since Hadoop QA wasn't picking both of them due to same timestamp.

        Show
        Viral Bajaria added a comment - reattaching trunk patch since Hadoop QA wasn't picking both of them due to same timestamp.
        Hide
        Viral Bajaria added a comment -

        Removed the TestFail and TestSuccess patches which were only here to demonstrate what was breaking.

        Show
        Viral Bajaria added a comment - Removed the TestFail and TestSuccess patches which were only here to demonstrate what was breaking.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12595494/HBASE-9079-0.94.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6561//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595494/HBASE-9079-0.94.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6561//console This message is automatically generated.
        Hide
        Viral Bajaria added a comment -

        Fixed the long lines as recommended by Ted.

        Show
        Viral Bajaria added a comment - Fixed the long lines as recommended by Ted.
        Hide
        Ted Yu added a comment -

        There're a few long lines in test:

        +    ColumnRangeFilter columnRangeFilter = new ColumnRangeFilter(Bytes.toBytes(cqStart), true, Bytes.toBytes(4), true);
        

        Lars Hofhansl:
        What do you think of latest patch ?

        Show
        Ted Yu added a comment - There're a few long lines in test: + ColumnRangeFilter columnRangeFilter = new ColumnRangeFilter(Bytes.toBytes(cqStart), true , Bytes.toBytes(4), true ); Lars Hofhansl : What do you think of latest patch ?
        Hide
        Viral Bajaria added a comment -

        (pressed enter too soon when attaching file... no easy way to edit a comment)

        I have uploaded a new patch for trunk after refreshing my workspace. I think the switch between branches wasn't clean for me when I did it the first time.

        The current patch should work fine on trunk too. I also cleaned up the TODO comment since there is no Configuration object anymore in FilterList. Also cleaned up the typo in the javadocs for areSerializedFieldsEqual()

        Show
        Viral Bajaria added a comment - (pressed enter too soon when attaching file... no easy way to edit a comment) I have uploaded a new patch for trunk after refreshing my workspace. I think the switch between branches wasn't clean for me when I did it the first time. The current patch should work fine on trunk too. I also cleaned up the TODO comment since there is no Configuration object anymore in FilterList. Also cleaned up the typo in the javadocs for areSerializedFieldsEqual()
        Hide
        Viral Bajaria added a comment -

        new patch for trunk

        Show
        Viral Bajaria added a comment - new patch for trunk
        Hide
        Ted Yu added a comment -

        Here was the checkin for trunk:

        r1499851 | tedyu | 2013-07-04 12:59:24 -0700 (Thu, 04 Jul 2013) | 3 lines
        
        HBASE-8847 Filter.transform() always applies unconditionally, even when combined in a FilterList (Christophe Taton)
        
        Show
        Ted Yu added a comment - Here was the checkin for trunk: r1499851 | tedyu | 2013-07-04 12:59:24 -0700 (Thu, 04 Jul 2013) | 3 lines HBASE-8847 Filter.transform() always applies unconditionally, even when combined in a FilterList (Christophe Taton)
        Hide
        Viral Bajaria added a comment -

        Oh wait! I do see all those changes on 0.94 branch, I don't see those changes on trunk right now. Which is why I said that FilterList on trunk is not in sync with that on 0.94

        Show
        Viral Bajaria added a comment - Oh wait! I do see all those changes on 0.94 branch, I don't see those changes on trunk right now. Which is why I said that FilterList on trunk is not in sync with that on 0.94
        Hide
        Ted Yu added a comment -
        Show
        Ted Yu added a comment - That's strange. I saw the changes from https://issues.apache.org/jira/secure/attachment/12592491/HBASE-8847.base%3D0.94.diff in 0.94 branch.
        Hide
        Viral Bajaria added a comment -

        Is it not a good idea to work from the github.com branches ? I was working off the latest 0.94 branch and did a pull again but don't see the changes that HBASE-8847 made to it.

        What am I missing ?

        Show
        Viral Bajaria added a comment - Is it not a good idea to work from the github.com branches ? I was working off the latest 0.94 branch and did a pull again but don't see the changes that HBASE-8847 made to it. What am I missing ?
        Hide
        Ted Yu added a comment -

        @Viral:
        The difference you saw in FilterList was due to HBASE-8847 which went into 0.94.10

        -  private KeyValue transformedKV = null;
        

        Please refresh your workspace and put your changes while keeping HBASE-8847

        Show
        Ted Yu added a comment - @Viral: The difference you saw in FilterList was due to HBASE-8847 which went into 0.94.10 - private KeyValue transformedKV = null ; Please refresh your workspace and put your changes while keeping HBASE-8847
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12595268/HBASE-9079-0.94.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6548//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12595268/HBASE-9079-0.94.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6548//console This message is automatically generated.
        Hide
        Viral Bajaria added a comment -

        Uploaded patch for both 0.94 and trunk. Interestingly 0.94 FilterList and trunk FilterList are not in sync. Is that expected ?

        I added the test to trunk too and tested it too.

        Show
        Viral Bajaria added a comment - Uploaded patch for both 0.94 and trunk. Interestingly 0.94 FilterList and trunk FilterList are not in sync. Is that expected ? I added the test to trunk too and tested it too.
        Hide
        Ted Yu added a comment -

        Maybe I could limit the scope of this ticket to MUST_PASS_ALL

        Fine with me.

        Show
        Ted Yu added a comment - Maybe I could limit the scope of this ticket to MUST_PASS_ALL Fine with me.
        Hide
        Viral Bajaria added a comment -

        What kind of re-ordering will you do ? Isn't the re-ordering more dependent on what kind of ordering the user wants in the FilterList ? i.e. apply my PrefixFilter first, then FuzzyRow then ColumnRange. If the user says apply PrefixFilter, then ColumnRange and then FuzzyRow should we not preserve that ordering ?

        I also take my words back on the issue existing in the current code. I think it does not because for MUST_PASS_ONE it keeps the min rowkey as the hint while for MUST_PASS_ALL it keeps the max. Maybe I could limit the scope of this ticket to MUST_PASS_ALL and keep MUST_PASS_ONE as-is ?

        Show
        Viral Bajaria added a comment - What kind of re-ordering will you do ? Isn't the re-ordering more dependent on what kind of ordering the user wants in the FilterList ? i.e. apply my PrefixFilter first, then FuzzyRow then ColumnRange. If the user says apply PrefixFilter, then ColumnRange and then FuzzyRow should we not preserve that ordering ? I also take my words back on the issue existing in the current code. I think it does not because for MUST_PASS_ONE it keeps the min rowkey as the hint while for MUST_PASS_ALL it keeps the max. Maybe I could limit the scope of this ticket to MUST_PASS_ALL and keep MUST_PASS_ONE as-is ?
        Hide
        Ted Yu added a comment -

        For trunk, can we introduce the following method to Filter:

          public KeyHintType getKeyHintType() {
        

        where KeyHintType is an enum that can carry NONE, ROW or COL.

        FilterList can poll the Filters and reorder them.

        Show
        Ted Yu added a comment - For trunk, can we introduce the following method to Filter: public KeyHintType getKeyHintType() { where KeyHintType is an enum that can carry NONE, ROW or COL. FilterList can poll the Filters and reorder them.
        Hide
        Ted Yu added a comment -

        w.r.t. ordering, there is no method in FilterBase which can tell us whether the hint provider operates at row or column level.

        For 0.96 / trunk, we may add such a method so that FilterList can (re)order the Filters accordingly.

        For 0.94, we can provide documentation on this aspect so that user can register Filters in correct order.

        you mean against the 0.95/0.96 tree ?

        Yes. I meant patch against trunk.

        Thanks

        Show
        Ted Yu added a comment - w.r.t. ordering, there is no method in FilterBase which can tell us whether the hint provider operates at row or column level. For 0.96 / trunk, we may add such a method so that FilterList can (re)order the Filters accordingly. For 0.94, we can provide documentation on this aspect so that user can register Filters in correct order. you mean against the 0.95/0.96 tree ? Yes. I meant patch against trunk. Thanks
        Hide
        Viral Bajaria added a comment -

        I will upload a new patch with the fixes that Ted pointed out.

        Ted Yu When you say trunk patch you mean against the 0.95/0.96 tree ?

        Regards Lars comment on turning it around to ==, I could move it to the following prior to even running the for loop:

        if (seekHintFilter != null) {
          return seekHintFilter.getNextKeyHint();
        }
        

        Regarding the ordering, I think the issue will be when operator is MUST_PASS_ONE and both filters want to give you a SEEK_HINT but one of them is operating at the row level while the other is operating at the column level. For example, if ColumnRange comes before FuzzyRow and operator is MUST_PASS_ONE, we will iterate through both the filters filterKeyValue method and keep the state returned from FuzzyRow and not from ColumnRange. I think this issue exists in current code too since we go through each filter and keep the max row.

        Personally I feel it's not a good use-case to make a FilterList with one filter operating at the row level and another at the column level and asking the operator to be MUST_PASS_ONE. That's almost like saying that keep a column even if row does not match. Any suggestions on what should be done here ?

        Show
        Viral Bajaria added a comment - I will upload a new patch with the fixes that Ted pointed out. Ted Yu When you say trunk patch you mean against the 0.95/0.96 tree ? Regards Lars comment on turning it around to ==, I could move it to the following prior to even running the for loop: if (seekHintFilter != null ) { return seekHintFilter.getNextKeyHint(); } Regarding the ordering, I think the issue will be when operator is MUST_PASS_ONE and both filters want to give you a SEEK_HINT but one of them is operating at the row level while the other is operating at the column level. For example, if ColumnRange comes before FuzzyRow and operator is MUST_PASS_ONE, we will iterate through both the filters filterKeyValue method and keep the state returned from FuzzyRow and not from ColumnRange. I think this issue exists in current code too since we go through each filter and keep the max row. Personally I feel it's not a good use-case to make a FilterList with one filter operating at the row level and another at the column level and asking the operator to be MUST_PASS_ONE. That's almost like saying that keep a column even if row does not match. Any suggestions on what should be done here ?
        Hide
        Lars Hofhansl added a comment -

        I also have a question about this:

        +      if (seekHintFilter != null && seekHintFilter != filter) {
        +        //get hint from the filter that was responsible for the
        +        //SEEK_NEXT_USING_HINT code
        +        continue;
        +      }
        

        As Ted asks... It seems only one filter should provide the hint. Can we turn this around and return filter.getNextKeyHint(...) if seekFilter == filter?

        Show
        Lars Hofhansl added a comment - I also have a question about this: + if (seekHintFilter != null && seekHintFilter != filter) { + //get hint from the filter that was responsible for the + //SEEK_NEXT_USING_HINT code + continue ; + } As Ted asks... It seems only one filter should provide the hint. Can we turn this around and return filter.getNextKeyHint(...) if seekFilter == filter ?
        Hide
        Ted Yu added a comment -

        For TestFuzzyAndColumnRangeFilter, please add license.

        Can you provide trunk patch so that we can let Hadoop QA run through it ?

        +        FilterList filterList = new FilterList(Lists.<Filter>newArrayList(fuzzyRowFilter, columnRangeFilter));
        

        Can you alter the order of the two filters above so that we know the correctness isn't dependent on ordering of the Filters ?
        Meaning both orders are tested.

        Indentation is off - it should be two spaces for each level of indentation.

        +                LOG.info("Got rk: " + Bytes.toStringBinary(kv.getRow()) + " cq: " + Bytes.toStringBinary(kv.getQualifier()));
        

        Length limit should be 100 per line.

        In getNextKeyHint():

             for (Filter filter : filters) {
        +      if (seekHintFilter != null && seekHintFilter != filter) {
        +        //get hint from the filter that was responsible for the
        +        //SEEK_NEXT_USING_HINT code
        +        continue;
        

        Does the above if block mean that only one Filter which provides seek hint would be respected ?

        Show
        Ted Yu added a comment - For TestFuzzyAndColumnRangeFilter, please add license. Can you provide trunk patch so that we can let Hadoop QA run through it ? + FilterList filterList = new FilterList(Lists.<Filter>newArrayList(fuzzyRowFilter, columnRangeFilter)); Can you alter the order of the two filters above so that we know the correctness isn't dependent on ordering of the Filters ? Meaning both orders are tested. Indentation is off - it should be two spaces for each level of indentation. + LOG.info( "Got rk: " + Bytes.toStringBinary(kv.getRow()) + " cq: " + Bytes.toStringBinary(kv.getQualifier())); Length limit should be 100 per line. In getNextKeyHint(): for (Filter filter : filters) { + if (seekHintFilter != null && seekHintFilter != filter) { + //get hint from the filter that was responsible for the + //SEEK_NEXT_USING_HINT code + continue ; Does the above if block mean that only one Filter which provides seek hint would be respected ?

          People

          • Assignee:
            Viral Bajaria
            Reporter:
            Viral Bajaria
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development