HBase
  1. HBase
  2. HBASE-6954

Column-counting filters can accept multiple versions of a column

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Filters
    • Labels:
      None

      Description

      It looks like the max version limit for a table or scanner is not applied to disregard older versions, prior to counting columns within a ColumnPaginationFilter or ColumnCountGetFilter. As a result, a Scan or Get can ultimately retrieve fewer than the requested number of columns when there is a sufficient number of existing columns to satisfy the request, if multiple versions of a column have been added to a row.

      A minimal test case demonstrating this behavior is attached.

      The javadoc for Get mentions 'Only Filter.filterKeyValue(KeyValue) is called AFTER all tests for ttl, column match, deletes and max versions have been run.'; for these two filters this behavior does not appear to be true, as flattening of multiple versions appears to occur after the filter has been applied.

        Issue Links

          Activity

          Hide
          Asaf Mesika added a comment -

          I agree. If you set versions=3 you would expect to have no more than 3 in your filter. Maybe the column counting can be done a little bit differently to address the problem in the code comment.

          Show
          Asaf Mesika added a comment - I agree. If you set versions=3 you would expect to have no more than 3 in your filter. Maybe the column counting can be done a little bit differently to address the problem in the code comment.
          Hide
          Andrew Olson added a comment -

          Thanks Lars. Yes, I do think that it should be addressed to prevent the unexpected results – took us some time to figure out this was the cause of one of our applications hanging. HBASE-5257 seems like a reasonable change to make, although it looks like HBASE-5104 already addresses our particular scenario by introducing getMaxResultsPerColumnFamily() and getRowOffsetPerColumnFamily() methods for the Get and Scan classes. If that is the case, perhaps ColumnPaginationFilter and ColumnCountGetFilter could be deprecated?

          Show
          Andrew Olson added a comment - Thanks Lars. Yes, I do think that it should be addressed to prevent the unexpected results – took us some time to figure out this was the cause of one of our applications hanging. HBASE-5257 seems like a reasonable change to make, although it looks like HBASE-5104 already addresses our particular scenario by introducing getMaxResultsPerColumnFamily() and getRowOffsetPerColumnFamily() methods for the Get and Scan classes. If that is the case, perhaps ColumnPaginationFilter and ColumnCountGetFilter could be deprecated?
          Hide
          Lars Hofhansl added a comment -

          As I pointed out on the mailing list, this is by design.
          Here's a comment to that extend in ScanQueryMatcher.java:

              /**
               * Filters should be checked before checking column trackers. If we do
               * otherwise, as was previously being done, ColumnTracker may increment its
               * counter for even that KV which may be discarded later on by Filter. This
               * would lead to incorrect results in certain cases.
               */
          

          There are cases where folks want filters before the version counting and cases where they want it after the version counting.
          Can't have it both ways, unless we're adding mechanisms to have the filter decide.

          I made an effort to do this in HBASE-5257, I can revive that if there is demand.

          Show
          Lars Hofhansl added a comment - As I pointed out on the mailing list, this is by design. Here's a comment to that extend in ScanQueryMatcher.java: /** * Filters should be checked before checking column trackers. If we do * otherwise, as was previously being done, ColumnTracker may increment its * counter for even that KV which may be discarded later on by Filter. This * would lead to incorrect results in certain cases. */ There are cases where folks want filters before the version counting and cases where they want it after the version counting. Can't have it both ways, unless we're adding mechanisms to have the filter decide. I made an effort to do this in HBASE-5257 , I can revive that if there is demand.
          Hide
          Ted Yu added a comment -

          @Andrew:
          There is already hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestColumnPaginationFilter.java in source repo
          Can you fold your test into that class ?

          Thanks

          Show
          Ted Yu added a comment - @Andrew: There is already hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestColumnPaginationFilter.java in source repo Can you fold your test into that class ? Thanks

            People

            • Assignee:
              Unassigned
              Reporter:
              Andrew Olson
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development