HBase
  1. HBase
  2. HBASE-6468

RowCounter may return incorrect result if column name is specified in command line

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.90.5
    • Fix Version/s: 0.95.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The RowCounter use FirstKeyOnlyFilter regardless of whether or not the
      command line argument specified a column family (or family:qualifier).
      In case when no qualifier was specified as argument, the scan will
      give correct result. However in the other case the scan instance may
      have been set with columns other than the very first column in the
      row, causing scan to get nothing as the FirstKeyOnlyFilter removes
      everything else.

      https://issues.apache.org/jira/browse/HBASE-6042 is related.

        Activity

        Hide
        Michael Drzal added a comment -

        Ted Yu can this be closed out?

        Show
        Michael Drzal added a comment - Ted Yu can this be closed out?
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #119 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/119/)
        HBASE-6468 RowCounter may return incorrect result if column name is specified in command line (Shrijeet) (Revision 1368625)

        Result = FAILURE
        tedyu :
        Files :

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FirstKeyValueMatchingQualifiersFilter.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java
        • /hbase/trunk/hbase-server/src/main/protobuf/Filter.proto
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFirstKeyValueMatchingQualifiersFilter.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestRowCounter.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #119 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/119/ ) HBASE-6468 RowCounter may return incorrect result if column name is specified in command line (Shrijeet) (Revision 1368625) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FirstKeyValueMatchingQualifiersFilter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java /hbase/trunk/hbase-server/src/main/protobuf/Filter.proto /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFirstKeyValueMatchingQualifiersFilter.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestRowCounter.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #3191 (See https://builds.apache.org/job/HBase-TRUNK/3191/)
        HBASE-6468 RowCounter may return incorrect result if column name is specified in command line (Shrijeet) (Revision 1368625)

        Result = FAILURE
        tedyu :
        Files :

        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FirstKeyValueMatchingQualifiersFilter.java
        • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java
        • /hbase/trunk/hbase-server/src/main/protobuf/Filter.proto
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFirstKeyValueMatchingQualifiersFilter.java
        • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestRowCounter.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #3191 (See https://builds.apache.org/job/HBase-TRUNK/3191/ ) HBASE-6468 RowCounter may return incorrect result if column name is specified in command line (Shrijeet) (Revision 1368625) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FirstKeyValueMatchingQualifiersFilter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java /hbase/trunk/hbase-server/src/main/protobuf/Filter.proto /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFirstKeyValueMatchingQualifiersFilter.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestRowCounter.java
        Hide
        Ted Yu added a comment -

        Integrated to trunk.

        Thanks for the patch, Shrijeet.

        Show
        Ted Yu added a comment - Integrated to trunk. Thanks for the patch, Shrijeet.
        Hide
        Ted Yu added a comment -

        Latest patch looks good.

        Will integrate tomorrow if there is no objection.

        Show
        Ted Yu added a comment - Latest patch looks good. Will integrate tomorrow if there is no objection.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12538850/0005-HBASE-6468-RowCounter-may-return-incorrect-results.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 8 new or modified tests.

        +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

        -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2476//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2476//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2476//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2476//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2476//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2476//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2476//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538850/0005-HBASE-6468-RowCounter-may-return-incorrect-results.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 6 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2476//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2476//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2476//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2476//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2476//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2476//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2476//console This message is automatically generated.
        Hide
        Shrijeet Paliwal added a comment -

        Last comments + change in Filter.proto

        Show
        Shrijeet Paliwal added a comment - Last comments + change in Filter.proto
        Hide
        Ted Yu added a comment -

        Git format is fine, acceptable by Hadoop QA.

        Show
        Ted Yu added a comment - Git format is fine, acceptable by Hadoop QA.
        Hide
        Shrijeet Paliwal added a comment -

        Will wait for Filter.proto to get committed. Is the patch format fine (I used git format-patch), or you want git diff --no-prefix ?

        Show
        Shrijeet Paliwal added a comment - Will wait for Filter.proto to get committed. Is the patch format fine (I used git format-patch), or you want git diff --no-prefix ?
        Hide
        Ted Yu added a comment -
        +   * Constructor which takes a list of columns. As soon as first KeyValue
        

        Now that the parameter has changed to Set<byte []>, the above javadoc should be modified.

        +   * matching any of these columns if found, filter moves to next row.
        

        'if found' -> 'is found'.

        +   * @param qualifiers the list of columns to me matched.
        

        Change to 'the set of columns to be matched'

        Looks like HBASE-6454 may go in ahead of this JIRA. So Filter.proto should have the following:

        +message FirstKeyValueMatchingQualifiersFilter {
        +}
        
        Show
        Ted Yu added a comment - + * Constructor which takes a list of columns. As soon as first KeyValue Now that the parameter has changed to Set<byte []>, the above javadoc should be modified. + * matching any of these columns if found, filter moves to next row. 'if found' -> 'is found'. + * @param qualifiers the list of columns to me matched. Change to 'the set of columns to be matched' Looks like HBASE-6454 may go in ahead of this JIRA. So Filter.proto should have the following: +message FirstKeyValueMatchingQualifiersFilter { +}
        Hide
        Shrijeet Paliwal added a comment -

        Patch off trunk.

        Show
        Shrijeet Paliwal added a comment - Patch off trunk.
        Hide
        Ted Yu added a comment -

        @Anoop:
        The behavior you described is expected for the new Filter, considering it is used for row counting.

        Show
        Ted Yu added a comment - @Anoop: The behavior you described is expected for the new Filter, considering it is used for row counting.
        Hide
        Anoop Sam John added a comment -

        So when a row r1 contains KVs with qualifier a,b,c (for a give CF) and the qualifier in FirstKeyValueMatchingQualifiersFilter are b,c
        we will include all KVs for qualifier a and one KV(1st KV) for qualifier b/c in the Result for row r1. Is this expected? I was thinking that this new Filter also will select only one KV for a row. But the selection is from a subset of qualifiers not whole.
        [KVs for qualifier a will come before b and c]

        Show
        Anoop Sam John added a comment - So when a row r1 contains KVs with qualifier a,b,c (for a give CF) and the qualifier in FirstKeyValueMatchingQualifiersFilter are b,c we will include all KVs for qualifier a and one KV(1st KV) for qualifier b/c in the Result for row r1. Is this expected? I was thinking that this new Filter also will select only one KV for a row. But the selection is from a subset of qualifiers not whole. [KVs for qualifier a will come before b and c]
        Hide
        Shrijeet Paliwal added a comment -

        FirstKeyValueMatchingQualifiersFilter -> When a CF contains qualifier a,b,c and the qualifiers provided are b and c, what will happen to the KVs for qualifier 'a' ? Will this be included in the Result? Is this expected?

        Depends on whether or not first KV matching any of the columns associated with filter have been seen yet or not.

        Can the qualifiers be accommodated in FirstKeyOnlyFilter only? Do we need a new Filter? Just a though from my side.
        By default FirstKeyOnlyFilter allow only the 1st KV (from all the qualifiers) from a CF to come in the Result and will filter out other KVs.
        Specifying a set of qualifiers to FirstKeyOnlyFilter will restrict the selection of the 1st KV from a any of these qualifiers only. It will filter out all KVs from other qualifiers.

        FKVMatchingQualifiersFilter has a peculiar behavior hence (extending it from FirstKeyOnlyFilter and) creating a new filter made sense.

        Show
        Shrijeet Paliwal added a comment - FirstKeyValueMatchingQualifiersFilter -> When a CF contains qualifier a,b,c and the qualifiers provided are b and c, what will happen to the KVs for qualifier 'a' ? Will this be included in the Result? Is this expected? Depends on whether or not first KV matching any of the columns associated with filter have been seen yet or not. Can the qualifiers be accommodated in FirstKeyOnlyFilter only? Do we need a new Filter? Just a though from my side. By default FirstKeyOnlyFilter allow only the 1st KV (from all the qualifiers) from a CF to come in the Result and will filter out other KVs. Specifying a set of qualifiers to FirstKeyOnlyFilter will restrict the selection of the 1st KV from a any of these qualifiers only. It will filter out all KVs from other qualifiers. FKVMatchingQualifiersFilter has a peculiar behavior hence (extending it from FirstKeyOnlyFilter and) creating a new filter made sense.
        Hide
        Anoop Sam John added a comment -

        FirstKeyValueMatchingQualifiersFilter -> When a CF contains qualifier a,b,c and the qualifiers provided are b and c, what will happen to the KVs for qualifier 'a' ? Will this be included in the Result? Is this expected?

        Can the qualifiers be accommodated in FirstKeyOnlyFilter only? Do we need a new Filter? Just a though from my side.
        By default FirstKeyOnlyFilter allow only the 1st KV (from all the qualifiers) from a CF to come in the Result and will filter out other KVs.
        Specifying a set of qualifiers to FirstKeyOnlyFilter will restrict the selection of the 1st KV from a any of these qualifiers only. It will filter out all KVs from other qualifiers.

        Show
        Anoop Sam John added a comment - FirstKeyValueMatchingQualifiersFilter -> When a CF contains qualifier a,b,c and the qualifiers provided are b and c, what will happen to the KVs for qualifier 'a' ? Will this be included in the Result? Is this expected? Can the qualifiers be accommodated in FirstKeyOnlyFilter only? Do we need a new Filter? Just a though from my side. By default FirstKeyOnlyFilter allow only the 1st KV (from all the qualifiers) from a CF to come in the Result and will filter out other KVs. Specifying a set of qualifiers to FirstKeyOnlyFilter will restrict the selection of the 1st KV from a any of these qualifiers only. It will filter out all KVs from other qualifiers.
        Hide
        Ted Yu added a comment -
        +    List<byte[]> qualifiers = new ArrayList<byte[]>();
        

        Would suggest replacing the List with a Set because the same qualifier might be entered more than once.

        Show
        Ted Yu added a comment - + List< byte []> qualifiers = new ArrayList< byte []>(); Would suggest replacing the List with a Set because the same qualifier might be entered more than once.
        Hide
        Ted Yu added a comment -

        Patch v2 is on the right track.

        There were some spelling mistakes.

        There was unnecessary formatting in RowCounter.java

        It's a pity that mapred/RowCounter.java doesn't use FirstKeyOnlyFilter. It's up to you whether you want to cover that class in this JIRA.

        TestFirstKeyMatchingQualifiersFilter should be named TestFirstKeyValueMatchingQualifiersFilter.

        Show
        Ted Yu added a comment - Patch v2 is on the right track. There were some spelling mistakes. There was unnecessary formatting in RowCounter.java It's a pity that mapred/RowCounter.java doesn't use FirstKeyOnlyFilter. It's up to you whether you want to cover that class in this JIRA. TestFirstKeyMatchingQualifiersFilter should be named TestFirstKeyValueMatchingQualifiersFilter.
        Hide
        Shrijeet Paliwal added a comment -

        This patch is again off 0.90 branch. Take a look if it looks fine I will generate a final patch off of trunk. I have not taken care of 'year' in copyright. Will do that in final patch.

        Show
        Shrijeet Paliwal added a comment - This patch is again off 0.90 branch. Take a look if it looks fine I will generate a final patch off of trunk. I have not taken care of 'year' in copyright. Will do that in final patch.
        Hide
        Ted Yu added a comment -

        My first comment lacks clarity: FirstKeyOnlyFilter would only deal with first key.

        The use case you listed above adds some complexity. Assuming the table is very wide, there is reason for user to do that.

        We can create new Filter whose ctor takes the columns user specifies. Its filterKeyValue() method would look for the given columns in KeyValue parameter. Once there is a match for any one of the columns, we can return ReturnCode.NEXT_ROW for remaining KeyValues in the row.

        Hope this helps.

        Show
        Ted Yu added a comment - My first comment lacks clarity: FirstKeyOnlyFilter would only deal with first key. The use case you listed above adds some complexity. Assuming the table is very wide, there is reason for user to do that. We can create new Filter whose ctor takes the columns user specifies. Its filterKeyValue() method would look for the given columns in KeyValue parameter. Once there is a match for any one of the columns, we can return ReturnCode.NEXT_ROW for remaining KeyValues in the row. Hope this helps.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12538216/0001-HBASE-6468-RowCounter-may-return-incorrect-result.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2448//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538216/0001-HBASE-6468-RowCounter-may-return-incorrect-result.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2448//console This message is automatically generated.
        Hide
        Shrijeet Paliwal added a comment -

        I do not know enough about filters, so help me here. If we go about modifying FirstKeyOnlyFilter, how does one handle a case when rowcounter was invoked with something like this : "rowcounter table f1:c1 f2:c2" . Create two instances of FirstKeyOnlyFilter (one for each pair)?

        Show
        Shrijeet Paliwal added a comment - I do not know enough about filters, so help me here. If we go about modifying FirstKeyOnlyFilter, how does one handle a case when rowcounter was invoked with something like this : "rowcounter table f1:c1 f2:c2" . Create two instances of FirstKeyOnlyFilter (one for each pair)?
        Hide
        Ted Yu added a comment -
        +    if (!scanHasColumns) {
        +      scan.setFilter(new FirstKeyOnlyFilter());
        +    }
        

        Can we enhance FirstKeyOnlyFilter so that the filter checks for designated column before returning ReturnCode.NEXT_ROW ?
        The rationale is that we want to speed up scan in case the designated column appears in the first half of the row.

        Please generate patch for trunk first.

        No year is needed in license header.

        Thanks

        Show
        Ted Yu added a comment - + if (!scanHasColumns) { + scan.setFilter( new FirstKeyOnlyFilter()); + } Can we enhance FirstKeyOnlyFilter so that the filter checks for designated column before returning ReturnCode.NEXT_ROW ? The rationale is that we want to speed up scan in case the designated column appears in the first half of the row. Please generate patch for trunk first. No year is needed in license header. Thanks

          People

          • Assignee:
            Shrijeet Paliwal
            Reporter:
            Shrijeet Paliwal
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development