HBase
  1. HBase
  2. HBASE-5416

Filter on one CF and if a match, then load and return full row (WAS: Improve performance of scans with some kind of filters)

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      New method is added to Filter which allows filter to specify which CF is needed to it's operation.

      public boolean isFamilyEssential(byte[] name);

      When new row is considered, only data for essential family is loaded and filter applied. And only if filter accepts the row, rest of data is loaded.

      This feature is off by default. You can use Scan.setLoadColumnFamiliesOnDemand() to enable it on a per Scan basis. If not indicated for the Scan, boolean value for "hbase.hregion.scan.loadColumnFamiliesOnDemand" would be used (default to false).
      Show
      New method is added to Filter which allows filter to specify which CF is needed to it's operation. public boolean isFamilyEssential(byte[] name); When new row is considered, only data for essential family is loaded and filter applied. And only if filter accepts the row, rest of data is loaded. This feature is off by default. You can use Scan.setLoadColumnFamiliesOnDemand() to enable it on a per Scan basis. If not indicated for the Scan, boolean value for "hbase.hregion.scan.loadColumnFamiliesOnDemand" would be used (default to false).

      Description

      When the scan is performed, whole row is loaded into result list, after that filter (if exists) is applied to detect that row is needed.

      But when scan is performed on several CFs and filter checks only data from the subset of these CFs, data from CFs, not checked by a filter is not needed on a filter stage. Only when we decided to include current row. And in such case we can significantly reduce amount of IO performed by a scan, by loading only values, actually checked by a filter.

      For example, we have two CFs: flags and snap. Flags is quite small (bunch of megabytes) and is used to filter large entries from snap. Snap is very large (10s of GB) and it is quite costly to scan it. If we needed only rows with some flag specified, we use SingleColumnValueFilter to limit result to only small subset of region. But current implementation is loading both CFs to perform scan, when only small subset is needed.

      Attached patch adds one routine to Filter interface to allow filter to specify which CF is needed to it's operation. In HRegion, we separate all scanners into two groups: needed for filter and the rest (joined). When new row is considered, only needed data is loaded, filter applied, and only if filter accepts the row, rest of data is loaded. At our data, this speeds up such kind of scans 30-50 times. Also, this gives us the way to better normalize the data into separate columns by optimizing the scans performed.

      1. Filtered_scans.patch
        8 kB
        Max Lapan
      2. Filtered_scans_v2.patch
        10 kB
        Max Lapan
      3. Filtered_scans_v3.patch
        16 kB
        Max Lapan
      4. Filtered_scans_v4.patch
        16 kB
        Max Lapan
      5. 5416-v5.txt
        16 kB
        Ted Yu
      6. 5416-v6.txt
        15 kB
        Ted Yu
      7. Filtered_scans_v5.patch
        22 kB
        Max Lapan
      8. Filtered_scans_v5.1.patch
        23 kB
        Max Lapan
      9. 5416-Filtered_scans_v6.patch
        22 kB
        Ted Yu
      10. Filtered_scans_v7.patch
        32 kB
        Max Lapan
      11. HBASE-5416-v7-rebased.patch
        32 kB
        Sergey Shelukhin
      12. HBASE-5416-v8.patch
        59 kB
        Sergey Shelukhin
      13. HBASE-5416-v9.patch
        59 kB
        Sergey Shelukhin
      14. HBASE-5416-v10.patch
        60 kB
        Sergey Shelukhin
      15. HBASE-5416-v11.patch
        60 kB
        Sergey Shelukhin
      16. HBASE-5416-v12.patch
        64 kB
        Sergey Shelukhin
      17. HBASE-5416-v12.patch
        64 kB
        Sergey Shelukhin
      18. 5416-v13.patch
        59 kB
        Ted Yu
      19. 5416-0.94-v1.txt
        33 kB
        Lars Hofhansl
      20. 5416-v14.patch
        59 kB
        Ted Yu
      21. 5416-v15.patch
        59 kB
        Ted Yu
      22. 5416-0.94-v2.txt
        33 kB
        Lars Hofhansl
      23. 5416-v16.patch
        59 kB
        Ted Yu
      24. org.apache.hadoop.hbase.regionserver.TestHRegion-output.txt
        4.23 MB
        Ted Yu
      25. 5416-0.94-v3.txt
        33 kB
        Lars Hofhansl
      26. 5416-drop-new-method-from-filter.txt
        5 kB
        Ted Yu
      27. 5416-TestJoinedScanners-0.94.txt
        7 kB
        Ted Yu

        Issue Links

          Activity

          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12514861/0001-Optimization-of-scans-using-filters.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/971//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12514861/0001-Optimization-of-scans-using-filters.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/971//console This message is automatically generated.
          Hide
          Max Lapan added a comment -

          Patch against 0.90.4

          Show
          Max Lapan added a comment - Patch against 0.90.4
          Hide
          Max Lapan added a comment -

          Patch against trunk.

          Show
          Max Lapan added a comment - Patch against trunk.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12514884/Filtered-scans_trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/973//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12514884/Filtered-scans_trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/973//console This message is automatically generated.
          Hide
          stack added a comment -

          Interesting idea. Patch looks pretty non-invasive to add some nice functionality. Would appreciate some better doc in the filters package doc or over in the manual to accompany this change. Nice one Max.

          Comments on patch:

          Please follow the convention you see in the surrounding file and parenthesize code blocks. E.g. in the below:

          +
          +  @Override
          +  public boolean isFamilyEssential(byte[] name) {
          +    for (Filter filter : filters)
          +      if (filter.isFamilyEssential(name))
          +        return true;
          +    return false;
          +  }
          

          What is a 'joinedScanner' in the below:

          +      List<KeyValueScanner> joinedScanners = new ArrayList<KeyValueScanner>();
          
          

          It needs a bit of a comment I'd say.

          Why drop the check for empty results in below?

          -          if (results.isEmpty() || filterRow()) {
          +          boolean filtered = filterRow();
          

          Please submit a patch with a --no-prefix so we can see how your patch does against hadoopqa.

          Show
          stack added a comment - Interesting idea. Patch looks pretty non-invasive to add some nice functionality. Would appreciate some better doc in the filters package doc or over in the manual to accompany this change. Nice one Max. Comments on patch: Please follow the convention you see in the surrounding file and parenthesize code blocks. E.g. in the below: + + @Override + public boolean isFamilyEssential( byte [] name) { + for (Filter filter : filters) + if (filter.isFamilyEssential(name)) + return true ; + return false ; + } What is a 'joinedScanner' in the below: + List<KeyValueScanner> joinedScanners = new ArrayList<KeyValueScanner>(); It needs a bit of a comment I'd say. Why drop the check for empty results in below? - if (results.isEmpty() || filterRow()) { + boolean filtered = filterRow(); Please submit a patch with a --no-prefix so we can see how your patch does against hadoopqa.
          Hide
          Max Lapan added a comment -

          Empty result is checked later, after values loaded from joinedHeap scanners. If check before load, we can lost rows when filter is SingleColumnValueExcludeFilter.

          Show
          Max Lapan added a comment - Empty result is checked later, after values loaded from joinedHeap scanners. If check before load, we can lost rows when filter is SingleColumnValueExcludeFilter.
          Hide
          Nicolas Spiegelberg added a comment -

          I'm confused about implementation. We already have a way to only load single CF data in a scan. Why don't you just use a 2-phase RPC in the HBase Client where the first phase scans 'flag' and then issues explicit multi-gets to the 'snap' family. Additionally, you can use bloom filters to filter out unnecessary HFiles if your doing this on an actively-written system.

          This sounds like an application detail or possibly coprocessor use case instead of something that should belong in the core. Maybe I'm missing something?

          Show
          Nicolas Spiegelberg added a comment - I'm confused about implementation. We already have a way to only load single CF data in a scan. Why don't you just use a 2-phase RPC in the HBase Client where the first phase scans 'flag' and then issues explicit multi-gets to the 'snap' family. Additionally, you can use bloom filters to filter out unnecessary HFiles if your doing this on an actively-written system. This sounds like an application detail or possibly coprocessor use case instead of something that should belong in the core. Maybe I'm missing something?
          Hide
          stack added a comment -

          @Max You need a test too.

          Show
          stack added a comment - @Max You need a test too.
          Hide
          Max Lapan added a comment -

          In our case, multi-gets is not a solution, because in our schema we have much more than 2 CFs (it was only the example). We have 7 CFs, and different scans are using different sets of CFs. Also, we have no prior knowlege about how many rows will be accepted by filter, so, it could be too many gets.

          Bloom filters also don't help much, because they filters whole files, not blocks as seek() does.

          Show
          Max Lapan added a comment - In our case, multi-gets is not a solution, because in our schema we have much more than 2 CFs (it was only the example). We have 7 CFs, and different scans are using different sets of CFs. Also, we have no prior knowlege about how many rows will be accepted by filter, so, it could be too many gets. Bloom filters also don't help much, because they filters whole files, not blocks as seek() does.
          Hide
          Max Lapan added a comment -

          @stack ok

          Show
          Max Lapan added a comment - @stack ok
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12514892/Filtered-scans_trunk.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 157 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.filter.TestFilter

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/974//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/974//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/974//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12514892/Filtered-scans_trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 157 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.filter.TestFilter Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/974//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/974//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/974//console This message is automatically generated.
          Hide
          Nicolas Spiegelberg added a comment -

          @Max, you can use Scan.setBatch() and Scan.setMaxResultsPerColumnFamily() to limit your batching factor to stream this operation. The main advantage of a 1-pass solution versus 2-pass is read atomicity across CFs, but that isn't guaranteed in 90 (see HBASE-2856). I'm just trying to think about proper API design. Scan is an easy API to accumulate functionality. It seems like this is emulating a server-side, 2-phase filter & join.

          Show
          Nicolas Spiegelberg added a comment - @Max, you can use Scan.setBatch() and Scan.setMaxResultsPerColumnFamily() to limit your batching factor to stream this operation. The main advantage of a 1-pass solution versus 2-pass is read atomicity across CFs, but that isn't guaranteed in 90 (see HBASE-2856 ). I'm just trying to think about proper API design. Scan is an easy API to accumulate functionality. It seems like this is emulating a server-side, 2-phase filter & join.
          Hide
          Max Lapan added a comment -

          @Nicolas: Hm, dont't understand how to implement 2-phase approach in map-reduce job, without extra complications. Also, haven't found Scan.setMaxResultsPerColumnFamily in current hbase source, only your patch for 0.89.

          Atomicity is not critical to us in this case - only performance and usage simplicity.

          Show
          Max Lapan added a comment - @Nicolas: Hm, dont't understand how to implement 2-phase approach in map-reduce job, without extra complications. Also, haven't found Scan.setMaxResultsPerColumnFamily in current hbase source, only your patch for 0.89. Atomicity is not critical to us in this case - only performance and usage simplicity.
          Hide
          Max Lapan added a comment -

          Fixed mistake when seek() skips rows sometimes.

          Style fixes and extra comments.

          Show
          Max Lapan added a comment - Fixed mistake when seek() skips rows sometimes. Style fixes and extra comments.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12515249/Filtered_scans.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.filter.TestFilter

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/993//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/993//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/993//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515249/Filtered_scans.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.filter.TestFilter Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/993//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/993//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/993//console This message is automatically generated.
          Hide
          stack added a comment -

          @Max What you think about the failed TestFilter in the above? Is it your patch? Thanks.

          Show
          stack added a comment - @Max What you think about the failed TestFilter in the above? Is it your patch? Thanks.
          Hide
          Max Lapan added a comment -

          Quite probable, have plans to find this out today.

          Would appreciate some better doc in the filters package doc or over in the manual to accompany this change.

          I have a question about this. "Manual" == hbase book? And what 'filters package doc' is? Is it comments in source processed by javadoc, or somethinc else? Sorry for these questions - have no java experience .

          Show
          Max Lapan added a comment - Quite probable, have plans to find this out today. Would appreciate some better doc in the filters package doc or over in the manual to accompany this change. I have a question about this. "Manual" == hbase book? And what 'filters package doc' is? Is it comments in source processed by javadoc, or somethinc else? Sorry for these questions - have no java experience .
          Hide
          stack added a comment -

          I have a question about this. "Manual" == hbase book? And what 'filters package doc' is? Is it comments in source processed by javadoc, or somethinc else? Sorry for these questions - have no java experience .

          No problem.

          Yes, the 'reference guide' or manual is this http://hbase.apache.org/book.html Its a bit tough making a patch for it if you don't know doc book too well so could just put a paragraph here and i'll get the doc in for you. Or, the filters package doc I was referring to is here: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html#package_description... but the doc here is pretty pathetic and describing this facility there might not go so well (its of a subtlety the current doc does not allow).

          Just stick a bit of a paragraph here and I'll figure where to put it.

          Go easy Max.

          You saw the failed test above? The fail in TestFilter? Do you see that when you run your tests local? On trunk you do it so:

          % mvn test -P localTests -Dtest=TestFilter
          
          Show
          stack added a comment - I have a question about this. "Manual" == hbase book? And what 'filters package doc' is? Is it comments in source processed by javadoc, or somethinc else? Sorry for these questions - have no java experience . No problem. Yes, the 'reference guide' or manual is this http://hbase.apache.org/book.html Its a bit tough making a patch for it if you don't know doc book too well so could just put a paragraph here and i'll get the doc in for you. Or, the filters package doc I was referring to is here: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/package-summary.html#package_description ... but the doc here is pretty pathetic and describing this facility there might not go so well (its of a subtlety the current doc does not allow). Just stick a bit of a paragraph here and I'll figure where to put it. Go easy Max. You saw the failed test above? The fail in TestFilter? Do you see that when you run your tests local? On trunk you do it so: % mvn test -P localTests -Dtest=TestFilter
          Hide
          Max Lapan added a comment -

          Thanks for the instruction. It caused by internal filter state broken by my patch (filterRow called in wrong time). Working on that.

          Show
          Max Lapan added a comment - Thanks for the instruction. It caused by internal filter state broken by my patch (filterRow called in wrong time). Working on that.
          Hide
          Max Lapan added a comment -

          The problem was a little bit tricky than I expected.

          The failed tests are caused by PageFilter and WhileMatchFilter expecting that filterRow method are called only once per non-empty row. Previous version of patch breaks this, so, tests are failed. I resolved this by checking that row is not empty right before filterRow(List) called, but this requires to slightly modify SingleColumnValueExcludeFilter logic - move exclude phase from filterKeyValue method to filterRow(List). The main reason for this is beacuse there is no way to distinguish at RegionScanner::nextInternal level empty row which is empty because of filter accepts row, but excludes all it's KVs and row which is empty due to filter rejects it.

          Show
          Max Lapan added a comment - The problem was a little bit tricky than I expected. The failed tests are caused by PageFilter and WhileMatchFilter expecting that filterRow method are called only once per non-empty row. Previous version of patch breaks this, so, tests are failed. I resolved this by checking that row is not empty right before filterRow(List) called, but this requires to slightly modify SingleColumnValueExcludeFilter logic - move exclude phase from filterKeyValue method to filterRow(List). The main reason for this is beacuse there is no way to distinguish at RegionScanner::nextInternal level empty row which is empty because of filter accepts row, but excludes all it's KVs and row which is empty due to filter rejects it.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12515558/Filtered_scans_v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 javadoc. The javadoc tool appears to have generated -136 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 151 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.filter.TestSingleColumnValueExcludeFilter

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1007//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1007//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1007//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515558/Filtered_scans_v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -136 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 151 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.filter.TestSingleColumnValueExcludeFilter Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1007//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1007//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1007//console This message is automatically generated.
          Hide
          Max Lapan added a comment -

          Fixed all failed tests, added test for joined scanners functionality.

          Show
          Max Lapan added a comment - Fixed all failed tests, added test for joined scanners functionality.
          Hide
          Ted Yu added a comment -

          @Max:
          This is a useful feature.
          I see some typo. e.g. for Filter.isFamilyEssential():

          +   * filters are always return true here, but some could have more sophisticated
          

          should read 'filters always return true'.
          You should also add @param for name parameter.

          Do you mind uploading latest patch onto https://reviews.apache.org (leaving Bugs field empty) ?
          That would make reviewing more smoothly.

          Show
          Ted Yu added a comment - @Max: This is a useful feature. I see some typo. e.g. for Filter.isFamilyEssential(): + * filters are always return true here, but some could have more sophisticated should read 'filters always return true'. You should also add @param for name parameter. Do you mind uploading latest patch onto https://reviews.apache.org (leaving Bugs field empty) ? That would make reviewing more smoothly.
          Hide
          Thomas Pan added a comment -

          This really looks like a very interesting patch. Just want to add my 2 cents to verify a use case without thoroughly reviewing the whole implementation details. Here is the use case: Assume that in a table, there are two column families as CF_A and CF_B. We have MapReduce job running a scan with a SingleColumnValueFilter against CF_A:Column_1. For rows that don't contain CF_A, the code has nothing to load, thus dropping these type of rows quickly. I just want to make sure that it is case with this patch.

          Show
          Thomas Pan added a comment - This really looks like a very interesting patch. Just want to add my 2 cents to verify a use case without thoroughly reviewing the whole implementation details. Here is the use case: Assume that in a table, there are two column families as CF_A and CF_B. We have MapReduce job running a scan with a SingleColumnValueFilter against CF_A:Column_1. For rows that don't contain CF_A, the code has nothing to load, thus dropping these type of rows quickly. I just want to make sure that it is case with this patch.
          Hide
          Ted Yu added a comment -
          +              KeyValue nextKV = this.joinedHeap.peek();
          +              while (true) {
          +                this.joinedHeap.next(results, limit - results.size());
          +                nextKV = this.joinedHeap.peek();
          

          I think the first peek() isn't needed because there is another peek() inside the loop.

          Show
          Ted Yu added a comment - + KeyValue nextKV = this .joinedHeap.peek(); + while ( true ) { + this .joinedHeap.next(results, limit - results.size()); + nextKV = this .joinedHeap.peek(); I think the first peek() isn't needed because there is another peek() inside the loop.
          Hide
          Max Lapan added a comment -

          @Thomas:

          Yes, this is the primary goal of this patch. When CF_B is large, we'll load only needed blocks from it (via seek), which could give a huge speedup in scan.

          @Zhihong:

          Thanks, I'll fix this, now waiting to jenkins results.
          Didn't know about reviews.apache.org, thanks. I'll post there, of couse .

          Show
          Max Lapan added a comment - @Thomas: Yes, this is the primary goal of this patch. When CF_B is large, we'll load only needed blocks from it (via seek), which could give a huge speedup in scan. @Zhihong: Thanks, I'll fix this, now waiting to jenkins results. Didn't know about reviews.apache.org, thanks. I'll post there, of couse .
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12515904/Filtered_scans_v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -133 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1041//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1041//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1041//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515904/Filtered_scans_v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -133 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1041//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1041//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1041//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          The following line is too long:

          +            if (this.joinedHeap != null && this.joinedHeap.seek(KeyValue.createFirstOnRow(currentRow))) {
          

          Please limit to 80 chars per line.

          You can get Eclipse formatter from HBASE-3678.

          Show
          Ted Yu added a comment - The following line is too long: + if ( this .joinedHeap != null && this .joinedHeap.seek(KeyValue.createFirstOnRow(currentRow))) { Please limit to 80 chars per line. You can get Eclipse formatter from HBASE-3678 .
          Hide
          Max Lapan added a comment -

          Fixed comment, removed extra peek() call and folded long line.

          Show
          Max Lapan added a comment - Fixed comment, removed extra peek() call and folded long line.
          Hide
          Max Lapan added a comment -

          @Zhihong: Have trouble with post new review request - it gives 500 error. Maybe this is related with apache jira issues, will try later.

          Show
          Max Lapan added a comment - @Zhihong: Have trouble with post new review request - it gives 500 error. Maybe this is related with apache jira issues, will try later.
          Hide
          Max Lapan added a comment -

          @stack:
          Documentation paragraph to include. I think it should go there: http://hbase.apache.org/book.html#number.of.cfs

          There is a performance option to keep in mind on schema design. In some situations, two (or more) columns family schema could be much faster than a single-CF design. It could be the case when you have one column which is used to sieve larger rows from other columns. If SingleColumnValueFilter or SingleColumnValueExcludeFilter is used to find the needed rows, only a small column is scanned, other columns are  loaded only when matching row has been found. This could reduce the amount of data loaded significantly and lead to faster scans.

          Show
          Max Lapan added a comment - @stack: Documentation paragraph to include. I think it should go there: http://hbase.apache.org/book.html#number.of.cfs There is a performance option to keep in mind on schema design. In some situations, two (or more) columns family schema could be much faster than a single-CF design. It could be the case when you have one column which is used to sieve larger rows from other columns. If SingleColumnValueFilter or SingleColumnValueExcludeFilter is used to find the needed rows, only a small column is scanned, other columns are  loaded only when matching row has been found. This could reduce the amount of data loaded significantly and lead to faster scans.
          Hide
          Max Lapan added a comment -

          There is still a mistake somewhere, our stats scan return different results.

          Show
          Max Lapan added a comment - There is still a mistake somewhere, our stats scan return different results.
          Hide
          Ted Yu added a comment -

          Patch v5 is based on v4, with grammatical corrections.

          @Max:
          What do you think ?

          @Override is missing for isFamilyEssential() in a few files.

          Show
          Ted Yu added a comment - Patch v5 is based on v4, with grammatical corrections. @Max: What do you think ? @Override is missing for isFamilyEssential() in a few files.
          Hide
          Ted Yu added a comment -

          Same as patch v5.
          I verified that patch v6 can be used to generate new review request.

          Show
          Ted Yu added a comment - Same as patch v5. I verified that patch v6 can be used to generate new review request.
          Hide
          Nicolas Spiegelberg added a comment -

          Overall, I agree that this is a useful design pattern. We use this pattern in our messages deployment and other production use cases as well. I'm more concerned about this being in the critical path. This is deep in the core logic, which has a lot of complicated usage and is extremely bug-prone (even after extensive unit tests).

          If you don't need atomicity, then you don't get much benefit from solving this in the critical path. The change introduces a lot of risk and design decisions that we have to worry about years later. It might be some work to understand how to use a batch factor; but don't you think it would take more work to understand the variety of use cases for scans to ensure that we don't introduce side effects and make a scalable architectural decision?

          At the very least, we should get a scan expert to look at this code before committing. I'm not one, but I know this isn't the same as making a business logic change. I just have one question about the patch right now: Should we have unit tests case for ensuring the interop between this feature and 'limit'? For example, ensure that joinedHeap is scanned before going to the next row if the storeHeap results.size() == limit

          Show
          Nicolas Spiegelberg added a comment - Overall, I agree that this is a useful design pattern. We use this pattern in our messages deployment and other production use cases as well. I'm more concerned about this being in the critical path. This is deep in the core logic, which has a lot of complicated usage and is extremely bug-prone (even after extensive unit tests). If you don't need atomicity, then you don't get much benefit from solving this in the critical path. The change introduces a lot of risk and design decisions that we have to worry about years later. It might be some work to understand how to use a batch factor; but don't you think it would take more work to understand the variety of use cases for scans to ensure that we don't introduce side effects and make a scalable architectural decision? At the very least, we should get a scan expert to look at this code before committing. I'm not one, but I know this isn't the same as making a business logic change. I just have one question about the patch right now: Should we have unit tests case for ensuring the interop between this feature and 'limit'? For example, ensure that joinedHeap is scanned before going to the next row if the storeHeap results.size() == limit
          Hide
          Max Lapan added a comment -

          @Nicolas:
          Still, have no idea how to resolve our slow scans problem different way. Two-phase rpc would be very inefficient in map-reduce job, when we need to issue lots of gets for each obtained 'flag' row and and have no good place to save them for multi-get (which could be huge in some cases). Batching also have little help there, because slowness not caused by a large Results, but tons of useless work, performed by a regionserver on such scans. Or, maybe, I missed something?

          I agree that this solution is not elegant and complicates scan machinery, but all other approaches looks worse.

          Show
          Max Lapan added a comment - @Nicolas: Still, have no idea how to resolve our slow scans problem different way. Two-phase rpc would be very inefficient in map-reduce job, when we need to issue lots of gets for each obtained 'flag' row and and have no good place to save them for multi-get (which could be huge in some cases). Batching also have little help there, because slowness not caused by a large Results, but tons of useless work, performed by a regionserver on such scans. Or, maybe, I missed something? I agree that this solution is not elegant and complicates scan machinery, but all other approaches looks worse.
          Hide
          Mikhail Bautin added a comment -

          @Max: if you scan the 'flag' column family first, find the rows that you are interested in, and query only those rows from the 'snap' column family, you will avoid the slowness from scanning every row in 'snap'. With proper batching, the two-pass approach should work fine if you don't need atomicity.

          The problem with such deep changes to the scanner framework is that it would require comprehensive new unit tests. The included unit test only writes three rows and does not really check the new feature (or the old functionality) on a large scale. Take a look at TestMultiColumnScanner and TestSeekOptimizations. We will need something at least as comprehensive as those tests for this improvement, probably even a multithreaded test case to ensure we don't break atomicity. If we do not do that testing now, we will still have to do it before the next stable release, but it would be unfair to pass the hidden costs of testing to those who don't need this particular optimization right now but will soon need a stable system for another production release.

          Show
          Mikhail Bautin added a comment - @Max: if you scan the 'flag' column family first, find the rows that you are interested in, and query only those rows from the 'snap' column family, you will avoid the slowness from scanning every row in 'snap'. With proper batching, the two-pass approach should work fine if you don't need atomicity. The problem with such deep changes to the scanner framework is that it would require comprehensive new unit tests. The included unit test only writes three rows and does not really check the new feature (or the old functionality) on a large scale. Take a look at TestMultiColumnScanner and TestSeekOptimizations. We will need something at least as comprehensive as those tests for this improvement, probably even a multithreaded test case to ensure we don't break atomicity. If we do not do that testing now, we will still have to do it before the next stable release, but it would be unfair to pass the hidden costs of testing to those who don't need this particular optimization right now but will soon need a stable system for another production release.
          Hide
          Kannan Muthukkaruppan added a comment -

          +1 to what Mikhail said.

          Max--- This is an interesting use case. I will take a closer look at the changes. But, if it is indeed the case that the set of rows you need to lookup in the second CF is a small % of the total data in that CF, then issuing subsequent gets (point lookups) for the relevant keys in that CF should work reasonably well, correct? BTW, are you doing this using HTableInputFormat? Perhaps you can detail the structure of your MR job more, and we can work through some specific options.

          Show
          Kannan Muthukkaruppan added a comment - +1 to what Mikhail said. Max--- This is an interesting use case. I will take a closer look at the changes. But, if it is indeed the case that the set of rows you need to lookup in the second CF is a small % of the total data in that CF, then issuing subsequent gets (point lookups) for the relevant keys in that CF should work reasonably well, correct? BTW, are you doing this using HTableInputFormat? Perhaps you can detail the structure of your MR job more, and we can work through some specific options.
          Hide
          Thomas Pan added a comment - - edited

          Atomcity can be achieved by applying the filter set twice. I agree with Mikhail that we need to have good code quality and decent unit test coverage. Complexity in the critical path might be a concern. Performance might be another as certain use cases benefit from the approach while others don't. Thus, we could consider execution plan from the relational database world for SQL tuning. Once the data is in the table, tune the execution plan (which way to go) against particular use case(s). Just my $0.02.

          Show
          Thomas Pan added a comment - - edited Atomcity can be achieved by applying the filter set twice. I agree with Mikhail that we need to have good code quality and decent unit test coverage. Complexity in the critical path might be a concern. Performance might be another as certain use cases benefit from the approach while others don't. Thus, we could consider execution plan from the relational database world for SQL tuning. Once the data is in the table, tune the execution plan (which way to go) against particular use case(s). Just my $0.02.
          Hide
          Max Lapan added a comment -

          @all
          Thanks for a discussion, I'll benchmark 2-phase approach, maybe it's a solution indeed.
          One thing still is not clear for me: how the batching factor could improve gets performance? The get request is synchronous, isn't it? So, in mapper, I issue get to obtain value from large column, and wait for it to be ready. In fact, single get won't be significantly havier than seek in scanner, but batching seems no help there. In fact, could be wrong there, didn't experiment much with that.

          Show
          Max Lapan added a comment - @all Thanks for a discussion, I'll benchmark 2-phase approach, maybe it's a solution indeed. One thing still is not clear for me: how the batching factor could improve gets performance? The get request is synchronous, isn't it? So, in mapper, I issue get to obtain value from large column, and wait for it to be ready. In fact, single get won't be significantly havier than seek in scanner, but batching seems no help there. In fact, could be wrong there, didn't experiment much with that.
          Hide
          Max Lapan added a comment -

          Fixed issues with limits in next() call.

          Show
          Max Lapan added a comment - Fixed issues with limits in next() call.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12529061/Filtered_scans_v5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1982//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12529061/Filtered_scans_v5.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1982//console This message is automatically generated.
          Hide
          Max Lapan added a comment -

          After a long delay, I decided to return to this optimization.
          We have this patch on our production system (300TB HBase data, 160 nodes) during last two months without issues. 2-phase approach tests demonstrated much worse performance improvement over this patch - only 2 times speedup vs near 20 times.

          I extended tests, but don't feel myself experienced enougth to implement concurrent, multithread test as suggested, sorry.

          Show
          Max Lapan added a comment - After a long delay, I decided to return to this optimization. We have this patch on our production system (300TB HBase data, 160 nodes) during last two months without issues. 2-phase approach tests demonstrated much worse performance improvement over this patch - only 2 times speedup vs near 20 times. I extended tests, but don't feel myself experienced enougth to implement concurrent, multithread test as suggested, sorry.
          Hide
          Ted Yu added a comment -

          @Max:
          The new patch is much larger than previous version. Can you provide more detailed description on the change ?

          Thanks

          Show
          Ted Yu added a comment - @Max: The new patch is much larger than previous version. Can you provide more detailed description on the change ? Thanks
          Hide
          Max Lapan added a comment -

          Additional code handled the case when InternalScanner::next called with limit != -1. In this case, we must remember KeyValueHeap we populated when limit reached, and restart this population on next method issue.

          I also added a test case for such situation.

          Show
          Max Lapan added a comment - Additional code handled the case when InternalScanner::next called with limit != -1. In this case, we must remember KeyValueHeap we populated when limit reached, and restart this population on next method issue. I also added a test case for such situation.
          Hide
          Ted Yu added a comment -

          Will go over the patch when I get into office.

          It would be nice to use https://reviews.apache.org to facilitate reviews.

          Show
          Ted Yu added a comment - Will go over the patch when I get into office. It would be nice to use https://reviews.apache.org to facilitate reviews.
          Hide
          Max Lapan added a comment -

          I tried to post it there, but constantly get Internal server error.

          Show
          Max Lapan added a comment - I tried to post it there, but constantly get Internal server error.
          Hide
          Max Lapan added a comment -

          Ahhh, I'm stupid, it works with hbase-git repository. Posted https://reviews.apache.org/r/5225/

          Show
          Max Lapan added a comment - Ahhh, I'm stupid, it works with hbase-git repository. Posted https://reviews.apache.org/r/5225/
          Hide
          Max Lapan added a comment -

          Fixed issues with incorrect rebase, applied suggested changes from first review.

          Show
          Max Lapan added a comment - Fixed issues with incorrect rebase, applied suggested changes from first review.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12529706/Filtered_scans_v5.1.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 hadoop23. The patch compiles against the hadoop 0.23.x profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.TestRegionRebalancing

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1994//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1994//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1994//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12529706/Filtered_scans_v5.1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 hadoop23. The patch compiles against the hadoop 0.23.x profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestRegionRebalancing Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1994//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1994//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1994//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          Rebased Max's latest patch on trunk.

          Show
          Ted Yu added a comment - Rebased Max's latest patch on trunk.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12530695/5416-Filtered_scans_v6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.master.TestSplitLogManager

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2094//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2094//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12530695/5416-Filtered_scans_v6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2094//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2094//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12530699/5416-Filtered_scans_v6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.master.TestSplitLogManager

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2095//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2095//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12530699/5416-Filtered_scans_v6.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2095//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2095//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2095//console This message is automatically generated.
          Hide
          Max Lapan added a comment -

          Implemented benchmark of joined scanners.
          You can run it with mvn test -P localTests --Dtest=TestJoinedScanners. It lasts for about an hour, so, don't foreget to increase forkedProcessTimeoutInSeconds it pom.xml file.

          On my notebook I got the following output:

          2012-06-29 22:12:00,182 INFO [main] regionserver.TestJoinedScanners(102): Make 100000 rows, total size = 9765.0 MB
          2012-06-29 22:56:51,231 INFO [main] regionserver.TestJoinedScanners(128): Data generated in 2691.048310914 seconds
          2012-06-29 23:03:03,865 INFO [main] regionserver.TestJoinedScanners(152): Slow scanner finished in 372.634075184 seconds, got 1000 rows
          2012-06-29 23:04:02,443 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 58.577552657 seconds, got 1000 rows
          2012-06-29 23:09:41,837 INFO [main] regionserver.TestJoinedScanners(195): Slow scanner finished in 339.394307354 seconds, got 1000 rows

          I run slow scanners test twice to be sure that it's not a cache effect. So, it's about 5.7 times speedup on this toy data.

          Show
          Max Lapan added a comment - Implemented benchmark of joined scanners. You can run it with mvn test -P localTests --Dtest=TestJoinedScanners . It lasts for about an hour, so, don't foreget to increase forkedProcessTimeoutInSeconds it pom.xml file. On my notebook I got the following output: 2012-06-29 22:12:00,182 INFO [main] regionserver.TestJoinedScanners(102): Make 100000 rows, total size = 9765.0 MB 2012-06-29 22:56:51,231 INFO [main] regionserver.TestJoinedScanners(128): Data generated in 2691.048310914 seconds 2012-06-29 23:03:03,865 INFO [main] regionserver.TestJoinedScanners(152): Slow scanner finished in 372.634075184 seconds, got 1000 rows 2012-06-29 23:04:02,443 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 58.577552657 seconds, got 1000 rows 2012-06-29 23:09:41,837 INFO [main] regionserver.TestJoinedScanners(195): Slow scanner finished in 339.394307354 seconds, got 1000 rows I run slow scanners test twice to be sure that it's not a cache effect. So, it's about 5.7 times speedup on this toy data.
          Hide
          Ted Yu added a comment -

          @Andrew, @Mikhail:
          What do you think of the performance comparison shown above ?

          Show
          Ted Yu added a comment - @Andrew, @Mikhail: What do you think of the performance comparison shown above ?
          Hide
          Ted Yu added a comment -

          I ran TestJoinedScanners on Linux and observed the following in test output:

          2012-07-16 17:31:52,339 INFO  [main] regionserver.TestJoinedScanners(152): Slow scanner finished in 96.393137286 seconds, got 1000 rows
          ...
          2012-07-16 17:32:05,026 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 12.687607287 seconds, got 1000 rows
          
          Show
          Ted Yu added a comment - I ran TestJoinedScanners on Linux and observed the following in test output: 2012-07-16 17:31:52,339 INFO [main] regionserver.TestJoinedScanners(152): Slow scanner finished in 96.393137286 seconds, got 1000 rows ... 2012-07-16 17:32:05,026 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 12.687607287 seconds, got 1000 rows
          Hide
          Ted Yu added a comment -

          Can I assume that there is no further review comment for this feature ?

          Show
          Ted Yu added a comment - Can I assume that there is no further review comment for this feature ?
          Hide
          Sergey Shelukhin added a comment -

          Hi. Should this be ok to commit to trunk? Thanks.

          Show
          Sergey Shelukhin added a comment - Hi. Should this be ok to commit to trunk? Thanks.
          Hide
          Anoop Sam John added a comment -

          I got a chance to go throw this and the discussion around
          @Max clearly it is a good idea. Improvement in your scenario will be huge..
          The concerns about the change is worth considering I guess. It is very critical path..
          I have one idea for you to solve the problem with out 2 phase RPC
          How about the below way?
          eg: I have one table with 2 CFs(cf1, cf2) I have a SCVF condition on cf1 (cf1:c1=v1)
          1. Create a Scan from the client side with only cf1 specified and with the filter

          SingleColumnValueFilter filter = new SingleColumnValueFilter(cf1, c1,
                  CompareOp.EQUAL, v1);
          Scan scan = new Scan();
          scan.setFilter(filter);
          scan.addFamily(cf1);
          for (Result result : ht.getScanner(scan)){
          // deal with result
          }
          

          2. Implement a RegionObserver CP and implement the preScannerNext() hook.. This hook execution will happen within the server
          In the hook for every rowkey which the scan selects, create a Get request with CF specified as the remaining CFs and add those KVs also to the Result

          public boolean postScannerNext(ObserverContext<RegionCoprocessorEnvironment> e,
                InternalScanner s, List<Result> results, int limit, boolean hasMore) throws IOException {
              // Next call happen on one region from HRS
              HRegion region = e.getEnvironment().getRegion();
              List<Result> finalResults = new ArrayList<Result>(results.size());
              for (Result result : results) {
                // Every result corresponds to one row.. Assume there is no batching being used
                byte[] row = result.getRow();
                Get get = new Get(row);
                get.addFamily(cf2);// cf1 is already fetched
                Result result2 = region.get(get, null);
                List<KeyValue> finalKVs = new ArrayList<KeyValue>();
                finalKVs.addAll(result.list());
                finalKVs.addAll(result2.list());
                finalResults.add(new Result(finalKVs));
              }
              // replace the results with the new finalResults
              results.clear();
              results.addAll(finalResults);
              return hasMore;
            }
          

          This hook is at the HRS level and after the Result object preperation. Right now we dont have any other hook during the scanner next() calls down the line so that we can deal with the KVs list.. So we need to recreate the Result and some ugly way of coding...
          This way it should be possible to fetch the data what you want. May be not as optimal as the way with the internal change.. But still be far far better than the 2 RPC calls...
          Now with CP we can achieve many things..

          Show
          Anoop Sam John added a comment - I got a chance to go throw this and the discussion around @Max clearly it is a good idea. Improvement in your scenario will be huge.. The concerns about the change is worth considering I guess. It is very critical path.. I have one idea for you to solve the problem with out 2 phase RPC How about the below way? eg: I have one table with 2 CFs(cf1, cf2) I have a SCVF condition on cf1 (cf1:c1=v1) 1. Create a Scan from the client side with only cf1 specified and with the filter SingleColumnValueFilter filter = new SingleColumnValueFilter(cf1, c1, CompareOp.EQUAL, v1); Scan scan = new Scan(); scan.setFilter(filter); scan.addFamily(cf1); for (Result result : ht.getScanner(scan)){ // deal with result } 2. Implement a RegionObserver CP and implement the preScannerNext() hook.. This hook execution will happen within the server In the hook for every rowkey which the scan selects, create a Get request with CF specified as the remaining CFs and add those KVs also to the Result public boolean postScannerNext(ObserverContext<RegionCoprocessorEnvironment> e, InternalScanner s, List<Result> results, int limit, boolean hasMore) throws IOException { // Next call happen on one region from HRS HRegion region = e.getEnvironment().getRegion(); List<Result> finalResults = new ArrayList<Result>(results.size()); for (Result result : results) { // Every result corresponds to one row.. Assume there is no batching being used byte [] row = result.getRow(); Get get = new Get(row); get.addFamily(cf2); // cf1 is already fetched Result result2 = region.get(get, null ); List<KeyValue> finalKVs = new ArrayList<KeyValue>(); finalKVs.addAll(result.list()); finalKVs.addAll(result2.list()); finalResults.add( new Result(finalKVs)); } // replace the results with the new finalResults results.clear(); results.addAll(finalResults); return hasMore; } This hook is at the HRS level and after the Result object preperation. Right now we dont have any other hook during the scanner next() calls down the line so that we can deal with the KVs list.. So we need to recreate the Result and some ugly way of coding... This way it should be possible to fetch the data what you want. May be not as optimal as the way with the internal change.. But still be far far better than the 2 RPC calls... Now with CP we can achieve many things..
          Hide
          Max Lapan added a comment -

          Yes, I think CP will work, thanks. The sad thing is that we use 0.90.6 (CDH) version of HBase, which don't have CPs. In fact, we use this patch on our production system without major issues and quite happy with it. But I don't think it's a good idea to include it in trunk, when much better approach exists.

          Show
          Max Lapan added a comment - Yes, I think CP will work, thanks. The sad thing is that we use 0.90.6 (CDH) version of HBase, which don't have CPs. In fact, we use this patch on our production system without major issues and quite happy with it. But I don't think it's a good idea to include it in trunk, when much better approach exists.
          Hide
          Sergey Shelukhin added a comment -

          I am not very familiar with the actual user scenarios of many HBase users yet, but the example outlined above (filter on small column, get big column only as needed) seems very general.
          Am I missing something?
          For any media (images/videos/documents)/binary storage, where contents don't change that often, that is a straightforward and legitimate perf boost.
          Maybe this can be made optional, with old behavior as default?
          CP approach seems kind of hacky for everyone to re-implement.

          Show
          Sergey Shelukhin added a comment - I am not very familiar with the actual user scenarios of many HBase users yet, but the example outlined above (filter on small column, get big column only as needed) seems very general. Am I missing something? For any media (images/videos/documents)/binary storage, where contents don't change that often, that is a straightforward and legitimate perf boost. Maybe this can be made optional, with old behavior as default? CP approach seems kind of hacky for everyone to re-implement.
          Hide
          Ted Yu added a comment -

          I agree with Sergey's point above.

          Show
          Ted Yu added a comment - I agree with Sergey's point above.
          Hide
          Anoop Sam John added a comment -

          Sergey and Ted
          I do agree 100% that it is a great idea. I think that this is some thing the core can do rather than asking users to do..

          Show
          Anoop Sam John added a comment - Sergey and Ted I do agree 100% that it is a great idea. I think that this is some thing the core can do rather than asking users to do..
          Hide
          Anoop Sam John added a comment -

          My only point was that it is possible using CP.. Above I have seen only people suggesting ways with 2 level RPCs.. So this solution came to my mind, I was just expressing so that people in need can get a solution way.
          I agree that it is a kind of hacky.. It is ugly code also

          Show
          Anoop Sam John added a comment - My only point was that it is possible using CP.. Above I have seen only people suggesting ways with 2 level RPCs.. So this solution came to my mind, I was just expressing so that people in need can get a solution way. I agree that it is a kind of hacky.. It is ugly code also
          Hide
          Sergey Shelukhin added a comment -

          Hmm... then, are there other specific objections/disagreement with the above, or should we proceed with the patch?

          Show
          Sergey Shelukhin added a comment - Hmm... then, are there other specific objections/disagreement with the above, or should we proceed with the patch?
          Hide
          Max Lapan added a comment -

          If no one against inclusion, let's include it . But I have a small improvement to do. Personally, I don't like filters interface alteration. When I started, I thought that it would be more filters to conform to optimization, but only SingleColumnValueFiler and SingleColumnValueFilter are. So, I'd better to just check for these filters in HRegionScanner than introduce extra method in interface.

          Show
          Max Lapan added a comment - If no one against inclusion, let's include it . But I have a small improvement to do. Personally, I don't like filters interface alteration. When I started, I thought that it would be more filters to conform to optimization, but only SingleColumnValueFiler and SingleColumnValueFilter are. So, I'd better to just check for these filters in HRegionScanner than introduce extra method in interface.
          Hide
          stack added a comment -

          Sergey Shelukhin See the concerns raised above by the likes of Nicolas and Mikhail where this patch messes in critical code path and so we should be careful committing it ensuring first sufficient test coverage and no degradation in perf.

          Show
          stack added a comment - Sergey Shelukhin See the concerns raised above by the likes of Nicolas and Mikhail where this patch messes in critical code path and so we should be careful committing it ensuring first sufficient test coverage and no degradation in perf.
          Hide
          Sergey Shelukhin added a comment -

          stack My point was that the approach is sound and that change being risky is not a good reason to not make it, on its own. +1 on tests/perf tests

          Show
          Sergey Shelukhin added a comment - stack My point was that the approach is sound and that change being risky is not a good reason to not make it, on its own. +1 on tests/perf tests
          Hide
          stack added a comment -

          Sergey Shelukhin If after sufficient tests (and perf), for sure. I think the case that the change has sufficient test needs to be built before it goes in.

          Show
          stack added a comment - Sergey Shelukhin If after sufficient tests (and perf), for sure. I think the case that the change has sufficient test needs to be built before it goes in.
          Hide
          Sergey Shelukhin added a comment -

          I've rebased the latest patch (not sure if I did it right, code changed around there) and the new large test now times out on my machine... I will look into it further.

          Show
          Sergey Shelukhin added a comment - I've rebased the latest patch (not sure if I did it right, code changed around there) and the new large test now times out on my machine... I will look into it further.
          Hide
          Max Lapan added a comment -

          Have you increased forkedProcessTimeoutInSeconds option it pom.xml?

          Show
          Max Lapan added a comment - Have you increased forkedProcessTimeoutInSeconds option it pom.xml?
          Hide
          Sergey Shelukhin added a comment -

          Hmm, no... let me try with increased. Should this setting be set in the patch too? This test is going to run with other tests every time I assume.

          Show
          Sergey Shelukhin added a comment - Hmm, no... let me try with increased. Should this setting be set in the patch too? This test is going to run with other tests every time I assume.
          Hide
          Sergey Shelukhin added a comment -

          With default setting, I got tired of waiting for it after an ~hour and canceled it. I think it was still inserting rows, although my laptop drive was extreley sluggish so maybe logs haven't flushed so I didn't see "Data inserted" log message.
          However, after changing the number of writes to 5k, I get very impressive performance improvement:
          2012-12-13 14:25:48,699 INFO [main] regionserver.TestJoinedScanners(155): Slow scanner finished in 5.15852 seconds, got 50 rows
          ...
          2012-12-13 14:25:48,931 INFO [main] regionserver.TestJoinedScanners(175): Joined scanner finished in 0.231946 seconds, got 50 rows

          It looks like it would be a good idea from perf standpoint to include this in trunk.
          Max Lapan do you want to continue on the patch? Otherwise I can continue on it.
          I think we'd want to make it optional (in config and/or per request) to address some of the stability concerns.
          I want to look at patch in more detail too to see what more testing can be done. It seems like maybe two LoadTestTool-s writing and verifying against the same table in parallel could be used, or something, in an integration test.

          I get "Filter with filterRow(List<KeyValue>) incompatible with scan with limit!" in testScanner_JoinedScannersAndLimits after rebase, I'll look into it.

          Show
          Sergey Shelukhin added a comment - With default setting, I got tired of waiting for it after an ~hour and canceled it. I think it was still inserting rows, although my laptop drive was extreley sluggish so maybe logs haven't flushed so I didn't see "Data inserted" log message. However, after changing the number of writes to 5k, I get very impressive performance improvement: 2012-12-13 14:25:48,699 INFO [main] regionserver.TestJoinedScanners(155): Slow scanner finished in 5.15852 seconds, got 50 rows ... 2012-12-13 14:25:48,931 INFO [main] regionserver.TestJoinedScanners(175): Joined scanner finished in 0.231946 seconds, got 50 rows It looks like it would be a good idea from perf standpoint to include this in trunk. Max Lapan do you want to continue on the patch? Otherwise I can continue on it. I think we'd want to make it optional (in config and/or per request) to address some of the stability concerns. I want to look at patch in more detail too to see what more testing can be done. It seems like maybe two LoadTestTool-s writing and verifying against the same table in parallel could be used, or something, in an integration test. I get "Filter with filterRow(List<KeyValue>) incompatible with scan with limit!" in testScanner_JoinedScannersAndLimits after rebase, I'll look into it.
          Hide
          Sergey Shelukhin added a comment -

          Hmm, on reruns, I cannot repro the gains above on small datasets. However I see difference in IO activity during slow and joined scanner

          Show
          Sergey Shelukhin added a comment - Hmm, on reruns, I cannot repro the gains above on small datasets. However I see difference in IO activity during slow and joined scanner
          Hide
          Sergey Shelukhin added a comment -

          Ok, here's the rebased patch, mvn test appears to pass at least small tests, rest are running now.
          Can you please sanity check?

          I modified the perf test to run within normal test time and to run scan 10 times (at least on my laptop most of the time is taken by load, due to compactions probably). When it settles after 1-2 iterations, I see perf improvement, about 20-25% on small dataset.

          Tomorrow I will probably look at config/more tests.

          Show
          Sergey Shelukhin added a comment - Ok, here's the rebased patch, mvn test appears to pass at least small tests, rest are running now. Can you please sanity check? I modified the perf test to run within normal test time and to run scan 10 times (at least on my laptop most of the time is taken by load, due to compactions probably). When it settles after 1-2 iterations, I see perf improvement, about 20-25% on small dataset. Tomorrow I will probably look at config/more tests.
          Hide
          Ted Yu added a comment -

          Based on patch v7, I got the following result on MacBook:

          grep 'scanner finished in' ../testJoinedScanners-output.txt
          2012-12-13 20:09:26,809 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 112.792634 seconds, got 100 rows
          2012-12-13 20:10:15,726 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 48.915989 seconds, got 100 rows
          2012-12-13 20:10:33,006 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 17.280432 seconds, got 100 rows
          2012-12-13 20:10:38,514 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 5.508207 seconds, got 100 rows
          2012-12-13 20:10:51,095 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 12.580323 seconds, got 100 rows
          2012-12-13 20:11:00,517 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 9.422024 seconds, got 100 rows
          2012-12-13 20:11:22,650 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 22.132854 seconds, got 100 rows
          2012-12-13 20:11:31,890 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 9.23955 seconds, got 100 rows
          2012-12-13 20:11:34,421 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 2.531598 seconds, got 100 rows
          2012-12-13 20:11:36,694 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 2.272578 seconds, got 100 rows
          2012-12-13 20:11:39,197 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 2.502777 seconds, got 100 rows
          2012-12-13 20:11:58,269 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 19.071438 seconds, got 100 rows
          2012-12-13 20:12:01,043 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 2.774262 seconds, got 100 rows
          2012-12-13 20:12:03,317 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 2.273745 seconds, got 100 rows
          2012-12-13 20:12:05,981 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 2.664124 seconds, got 100 rows
          2012-12-13 20:12:08,574 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 2.593234 seconds, got 100 rows
          2012-12-13 20:12:11,130 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 2.555977 seconds, got 100 rows
          2012-12-13 20:12:13,381 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 2.250275 seconds, got 100 rows
          2012-12-13 20:12:15,721 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 2.340003 seconds, got 100 rows
          2012-12-13 20:12:18,075 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 2.354218 seconds, got 100 rows
          

          I am running the test on Linux.

          Will take another look at the patch and test result tomorrow.

          Show
          Ted Yu added a comment - Based on patch v7, I got the following result on MacBook: grep 'scanner finished in' ../testJoinedScanners-output.txt 2012-12-13 20:09:26,809 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 112.792634 seconds, got 100 rows 2012-12-13 20:10:15,726 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 48.915989 seconds, got 100 rows 2012-12-13 20:10:33,006 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 17.280432 seconds, got 100 rows 2012-12-13 20:10:38,514 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 5.508207 seconds, got 100 rows 2012-12-13 20:10:51,095 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 12.580323 seconds, got 100 rows 2012-12-13 20:11:00,517 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 9.422024 seconds, got 100 rows 2012-12-13 20:11:22,650 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 22.132854 seconds, got 100 rows 2012-12-13 20:11:31,890 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 9.23955 seconds, got 100 rows 2012-12-13 20:11:34,421 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 2.531598 seconds, got 100 rows 2012-12-13 20:11:36,694 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 2.272578 seconds, got 100 rows 2012-12-13 20:11:39,197 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 2.502777 seconds, got 100 rows 2012-12-13 20:11:58,269 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 19.071438 seconds, got 100 rows 2012-12-13 20:12:01,043 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 2.774262 seconds, got 100 rows 2012-12-13 20:12:03,317 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 2.273745 seconds, got 100 rows 2012-12-13 20:12:05,981 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 2.664124 seconds, got 100 rows 2012-12-13 20:12:08,574 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 2.593234 seconds, got 100 rows 2012-12-13 20:12:11,130 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 2.555977 seconds, got 100 rows 2012-12-13 20:12:13,381 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 2.250275 seconds, got 100 rows 2012-12-13 20:12:15,721 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 2.340003 seconds, got 100 rows 2012-12-13 20:12:18,075 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 2.354218 seconds, got 100 rows I am running the test on Linux. Will take another look at the patch and test result tomorrow.
          Hide
          Ted Yu added a comment -

          Here is test result from Linux:

          grep 'scanner finished in' testJoinedScanners-output.txt
          2012-12-13 20:28:36,780 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 29.421479079 seconds, got 100 rows
          2012-12-13 20:28:47,617 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 10.836890451 seconds, got 100 rows
          2012-12-13 20:28:58,637 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 11.019543361 seconds, got 100 rows
          2012-12-13 20:29:07,865 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 9.227820454 seconds, got 100 rows
          2012-12-13 20:29:17,690 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 9.824966218 seconds, got 100 rows
          2012-12-13 20:29:26,317 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 8.626794601 seconds, got 100 rows
          2012-12-13 20:29:36,288 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 9.97033987 seconds, got 100 rows
          2012-12-13 20:29:45,033 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 8.745137076 seconds, got 100 rows
          2012-12-13 20:29:55,023 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 9.989630848 seconds, got 100 rows
          2012-12-13 20:30:03,416 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 8.392952897 seconds, got 100 rows
          2012-12-13 20:30:12,267 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 8.850649054 seconds, got 100 rows
          2012-12-13 20:30:20,985 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 8.718266736 seconds, got 100 rows
          2012-12-13 20:30:30,108 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 9.122057799 seconds, got 100 rows
          2012-12-13 20:30:38,669 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 8.561782079 seconds, got 100 rows
          2012-12-13 20:30:47,898 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 9.228045508 seconds, got 100 rows
          2012-12-13 20:30:57,057 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 9.158965127 seconds, got 100 rows
          2012-12-13 20:31:07,428 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 10.370526135 seconds, got 100 rows
          2012-12-13 20:31:16,586 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 9.157627332 seconds, got 100 rows
          2012-12-13 20:31:25,612 INFO  [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 9.026821302 seconds, got 100 rows
          2012-12-13 20:31:34,553 INFO  [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 8.93992941 seconds, got 100 rows
          
          Show
          Ted Yu added a comment - Here is test result from Linux: grep 'scanner finished in' testJoinedScanners-output.txt 2012-12-13 20:28:36,780 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 29.421479079 seconds, got 100 rows 2012-12-13 20:28:47,617 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 10.836890451 seconds, got 100 rows 2012-12-13 20:28:58,637 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 11.019543361 seconds, got 100 rows 2012-12-13 20:29:07,865 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 9.227820454 seconds, got 100 rows 2012-12-13 20:29:17,690 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 9.824966218 seconds, got 100 rows 2012-12-13 20:29:26,317 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 8.626794601 seconds, got 100 rows 2012-12-13 20:29:36,288 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 9.97033987 seconds, got 100 rows 2012-12-13 20:29:45,033 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 8.745137076 seconds, got 100 rows 2012-12-13 20:29:55,023 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 9.989630848 seconds, got 100 rows 2012-12-13 20:30:03,416 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 8.392952897 seconds, got 100 rows 2012-12-13 20:30:12,267 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 8.850649054 seconds, got 100 rows 2012-12-13 20:30:20,985 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 8.718266736 seconds, got 100 rows 2012-12-13 20:30:30,108 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 9.122057799 seconds, got 100 rows 2012-12-13 20:30:38,669 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 8.561782079 seconds, got 100 rows 2012-12-13 20:30:47,898 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 9.228045508 seconds, got 100 rows 2012-12-13 20:30:57,057 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 9.158965127 seconds, got 100 rows 2012-12-13 20:31:07,428 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 10.370526135 seconds, got 100 rows 2012-12-13 20:31:16,586 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 9.157627332 seconds, got 100 rows 2012-12-13 20:31:25,612 INFO [main] regionserver.TestJoinedScanners(172): Slow scanner finished in 9.026821302 seconds, got 100 rows 2012-12-13 20:31:34,553 INFO [main] regionserver.TestJoinedScanners(172): Joined scanner finished in 8.93992941 seconds, got 100 rows
          Hide
          Ted Yu added a comment -

          There is significant performance gain for Joined scanner in the first few iterations.
          The test generates data and then runs scans alternately using normal and joined scanners. I think we should test scenarios where write and scan operations interleave.

          Show
          Ted Yu added a comment - There is significant performance gain for Joined scanner in the first few iterations. The test generates data and then runs scans alternately using normal and joined scanners. I think we should test scenarios where write and scan operations interleave.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12560912/HBASE-5416-v7-rebased.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any additional warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 23 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12560912/HBASE-5416-v7-rebased.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 8 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any additional warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 23 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3542//console This message is automatically generated.
          Hide
          Ted Yu added a comment -
               KeyValueHeap storeHeap = null;
          +    KeyValueHeap joinedHeap = null;
          

          Add a comment explaining the role of joinedHeap.
          joinedHeap is only referenced in RegionScannerImpl. So it can be private.

          +    // state flag which indicates when joined heap data gather interrupted due to scan limits
          

          'data gather interrupted' -> 'data gathering is interrupted'

          +      // Here we separate all scanners into two lists - first is scanners,
          +      // providing data required by the filter to operate (scanners list) and
          

          'first is scanners, providing' -> 'first are scanners that provide'

          +     * @return true if limit reached, false ovewise.
          

          'ovewise' -> 'otherwise'

          +          if (isEmptyRow /* TODO: || filterRow() is gone in trunk */) {
          

          Can the TODO be removed ?

                     }
          +          else {
          

          nit: lift else to follow }

          +                // As the data obtained from two independent heaps, we need to
          

          'the data obtained' -> 'the data is obtained'

          +                // Result list population was interrupted by limits, we need to restart it on next() invocation.
          

          nit: wrap long line.

          +            // the case when SingleValueExcludeFilter is used.
          

          'SingleValueExcludeFilter' -> 'SingleColumnValueExcludeFilter'

          +    private KeyValue peekKv() {
          +      return this.joinedHeapHasMoreData ? this.joinedHeap.peek() : this.storeHeap.peek();
          +    }
          

          The above method is only called once. Consider merging the body into caller.

          +  public void testScanner_JoinedScannersAndLimits() throws IOException {
          

          nit: JoinedScannersAndLimits -> JoinedScannersWithLimits

          For TestJoinedScanners.java, remove year in license header.

          +      LOG.info("Make " + Long.toString(rows_to_insert) + " rows, total size = " + Float.toString(rows_to_insert * large_bytes / 1024 / 1024) + " MB");
          ...
          +      LOG.info("Data generated in " + Double.toString((System.nanoTime() - time) / 1000000000.0) + " seconds");
          ...
          +    SingleColumnValueFilter flt = new SingleColumnValueFilter(cf_essential, col_name, CompareFilter.CompareOp.EQUAL, flag_yes);
          ...
          +      + " scanner finished in " + Double.toString(timeSec) + " seconds, got " + Long.toString(rows_count/2) + " rows");
          

          nit: wrap long line

          Show
          Ted Yu added a comment - KeyValueHeap storeHeap = null ; + KeyValueHeap joinedHeap = null ; Add a comment explaining the role of joinedHeap. joinedHeap is only referenced in RegionScannerImpl. So it can be private. + // state flag which indicates when joined heap data gather interrupted due to scan limits 'data gather interrupted' -> 'data gathering is interrupted' + // Here we separate all scanners into two lists - first is scanners, + // providing data required by the filter to operate (scanners list) and 'first is scanners, providing' -> 'first are scanners that provide' + * @ return true if limit reached, false ovewise. 'ovewise' -> 'otherwise' + if (isEmptyRow /* TODO: || filterRow() is gone in trunk */) { Can the TODO be removed ? } + else { nit: lift else to follow } + // As the data obtained from two independent heaps, we need to 'the data obtained' -> 'the data is obtained' + // Result list population was interrupted by limits, we need to restart it on next() invocation. nit: wrap long line. + // the case when SingleValueExcludeFilter is used. 'SingleValueExcludeFilter' -> 'SingleColumnValueExcludeFilter' + private KeyValue peekKv() { + return this .joinedHeapHasMoreData ? this .joinedHeap.peek() : this .storeHeap.peek(); + } The above method is only called once. Consider merging the body into caller. + public void testScanner_JoinedScannersAndLimits() throws IOException { nit: JoinedScannersAndLimits -> JoinedScannersWithLimits For TestJoinedScanners.java, remove year in license header. + LOG.info( "Make " + Long .toString(rows_to_insert) + " rows, total size = " + Float .toString(rows_to_insert * large_bytes / 1024 / 1024) + " MB" ); ... + LOG.info( "Data generated in " + Double .toString(( System .nanoTime() - time) / 1000000000.0) + " seconds" ); ... + SingleColumnValueFilter flt = new SingleColumnValueFilter(cf_essential, col_name, CompareFilter.CompareOp.EQUAL, flag_yes); ... + + " scanner finished in " + Double .toString(timeSec) + " seconds, got " + Long .toString(rows_count/2) + " rows" ); nit: wrap long line
          Hide
          Sergey Shelukhin added a comment -

          I wonder if it could be the issue with the rebase, or with changes to trunk after the last patch; or if 100k rows are needed to show improvement?
          I get similar results now.
          Need to add some logging to see if the new code paths are actually being run.

          Show
          Sergey Shelukhin added a comment - I wonder if it could be the issue with the rebase, or with changes to trunk after the last patch; or if 100k rows are needed to show improvement? I get similar results now. Need to add some logging to see if the new code paths are actually being run.
          Hide
          Sergey Shelukhin added a comment -

          Looks like scanner serialization issue, probably due to PB, I'll update the patch.

          Show
          Sergey Shelukhin added a comment - Looks like scanner serialization issue, probably due to PB, I'll update the patch.
          Hide
          Sergey Shelukhin added a comment -

          ...or maybe not. Hmm

          Show
          Sergey Shelukhin added a comment - ...or maybe not. Hmm
          Hide
          Sergey Shelukhin added a comment -

          ok I fixed it, patch coming after lunch

          Show
          Sergey Shelukhin added a comment - ok I fixed it, patch coming after lunch
          Hide
          Sergey Shelukhin added a comment -

          Patch: fix issues with rebase; test modifications compared to v7 patch as outlined above; make configurable globally and per request (and per table when HBASE-7236 goes in ).
          With this patch I get the perf gains back on my laptop.

          Show
          Sergey Shelukhin added a comment - Patch: fix issues with rebase; test modifications compared to v7 patch as outlined above; make configurable globally and per request (and per table when HBASE-7236 goes in ). With this patch I get the perf gains back on my laptop.
          Hide
          Ted Yu added a comment - - edited
          +  public static KeyValue createFirstOnRow(final byte [] row, int roffset, short rlength) {
          +    return new KeyValue(row, roffset, rlength,
          +        null, 0, 0, null, 0, 0, HConstants.LATEST_TIMESTAMP, Type.Minimum, null, 0, 0);
          

          The other createFirstOnRow() methods use Type.Maximum

          +  public enum LazyCfLoadingValue {
          

          The above enum is a tri-state boolean. I think using Boolean should suffice.

          +    // Heap of key-values that are not necessary for the provided filters and are thus read
          +    // on demand, if lazy cf loading is enabled.
          +    KeyValueHeap joinedHeap = null;
          

          'not necessary' -> 'not essential'
          'cf loading' -> 'column family loading'

          Show
          Ted Yu added a comment - - edited + public static KeyValue createFirstOnRow( final byte [] row, int roffset, short rlength) { + return new KeyValue(row, roffset, rlength, + null , 0, 0, null , 0, 0, HConstants.LATEST_TIMESTAMP, Type.Minimum, null , 0, 0); The other createFirstOnRow() methods use Type.Maximum + public enum LazyCfLoadingValue { The above enum is a tri-state boolean. I think using Boolean should suffice. + // Heap of key-values that are not necessary for the provided filters and are thus read + // on demand, if lazy cf loading is enabled. + KeyValueHeap joinedHeap = null ; 'not necessary' -> 'not essential' 'cf loading' -> 'column family loading'
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12561070/HBASE-5416-v8.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any additional warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 26 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561070/HBASE-5416-v8.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 8 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any additional warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 26 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3556//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          I ran test suite with the following change based on patch v8:

          +  public static KeyValue createFirstOnRow(final byte [] row, int roffset, short rlength) {
          +    return new KeyValue(row, roffset, rlength,
          +        null, 0, 0, null, 0, 0, HConstants.LATEST_TIMESTAMP, Type.Maximum, null, 0, 0);
          +  }
          

          I got:

          [INFO] HBase ............................................. SUCCESS [3.012s]
          [INFO] HBase - Common .................................... SUCCESS [14.056s]
          [INFO] HBase - Protocol .................................. SUCCESS [13.557s]
          [INFO] HBase - Client .................................... SUCCESS [0.654s]
          [INFO] HBase - Hadoop Compatibility ...................... SUCCESS [0.733s]
          [INFO] HBase - Hadoop One Compatibility .................. SUCCESS [1.753s]
          [INFO] HBase - Server .................................... SUCCESS [44:06.793s]
          [INFO] HBase - Hadoop Two Compatibility .................. SUCCESS [9.770s]
          [INFO] HBase - Integration Tests ......................... SUCCESS [3.340s]
          [INFO] HBase - Examples .................................. SUCCESS [30.981s]
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD SUCCESS
          

          For TestJoinedScanners:

          2012-12-14 19:40:01,764 INFO  [pool-1-thread-1] regionserver.TestJoinedScanners(168): Slow scanner finished in 31.798885086 seconds, got 100 rows
          2012-12-14 19:40:03,710 INFO  [pool-1-thread-1] regionserver.TestJoinedScanners(168): Joined scanner finished in 1.946100741 seconds, got 100 rows
          ...
          2012-12-14 19:43:29,757 INFO  [pool-1-thread-1] regionserver.TestJoinedScanners(168): Slow scanner finished in 44.846782189 seconds, got 100 rows
          2012-12-14 19:43:31,871 INFO  [pool-1-thread-1] regionserver.TestJoinedScanners(168): Joined scanner finished in 2.11452886 seconds, got 100 rows
          
          Show
          Ted Yu added a comment - I ran test suite with the following change based on patch v8: + public static KeyValue createFirstOnRow( final byte [] row, int roffset, short rlength) { + return new KeyValue(row, roffset, rlength, + null , 0, 0, null , 0, 0, HConstants.LATEST_TIMESTAMP, Type.Maximum, null , 0, 0); + } I got: [INFO] HBase ............................................. SUCCESS [3.012s] [INFO] HBase - Common .................................... SUCCESS [14.056s] [INFO] HBase - Protocol .................................. SUCCESS [13.557s] [INFO] HBase - Client .................................... SUCCESS [0.654s] [INFO] HBase - Hadoop Compatibility ...................... SUCCESS [0.733s] [INFO] HBase - Hadoop One Compatibility .................. SUCCESS [1.753s] [INFO] HBase - Server .................................... SUCCESS [44:06.793s] [INFO] HBase - Hadoop Two Compatibility .................. SUCCESS [9.770s] [INFO] HBase - Integration Tests ......................... SUCCESS [3.340s] [INFO] HBase - Examples .................................. SUCCESS [30.981s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS For TestJoinedScanners: 2012-12-14 19:40:01,764 INFO [pool-1-thread-1] regionserver.TestJoinedScanners(168): Slow scanner finished in 31.798885086 seconds, got 100 rows 2012-12-14 19:40:03,710 INFO [pool-1-thread-1] regionserver.TestJoinedScanners(168): Joined scanner finished in 1.946100741 seconds, got 100 rows ... 2012-12-14 19:43:29,757 INFO [pool-1-thread-1] regionserver.TestJoinedScanners(168): Slow scanner finished in 44.846782189 seconds, got 100 rows 2012-12-14 19:43:31,871 INFO [pool-1-thread-1] regionserver.TestJoinedScanners(168): Joined scanner finished in 2.11452886 seconds, got 100 rows
          Hide
          Sergey Shelukhin added a comment -

          The above enum is a tri-state boolean. I think using Boolean should suffice.

          Imho the Boolean is not as expressive... I did Boolean first then replaced with enum because the null checks looked sketchy.
          I guess I can replace it back if there's strong opinion.

          Rest fixed.

          Show
          Sergey Shelukhin added a comment - The above enum is a tri-state boolean. I think using Boolean should suffice. Imho the Boolean is not as expressive... I did Boolean first then replaced with enum because the null checks looked sketchy. I guess I can replace it back if there's strong opinion. Rest fixed.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12561329/HBASE-5416-v9.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any additional warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 26 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          -1 core zombie tests. There are zombie tests. See build logs for details.

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561329/HBASE-5416-v9.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 8 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any additional warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 26 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in . -1 core zombie tests . There are zombie tests. See build logs for details. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3573//console This message is automatically generated.
          Hide
          Sergey Shelukhin added a comment -

          I have an integration test for this which I will probably put in separate JIRA. It would do repeated scans and verification specific to this feature plus similar to MultiThreadedReader while multiple writer threads are running. Need to check whether it works
          Any other updates on the patch/requirements if it works?
          Thanks.

          Show
          Sergey Shelukhin added a comment - I have an integration test for this which I will probably put in separate JIRA. It would do repeated scans and verification specific to this feature plus similar to MultiThreadedReader while multiple writer threads are running. Need to check whether it works Any other updates on the patch/requirements if it works? Thanks.
          Hide
          Max Lapan added a comment -

          Hi!

          The patch misses one small fix I made this summer (foregot to post it here, sorry). It is trivial in code, but a little tricky in logic.

          The problem is in SingleColumnValueFilter with filterIfMissing=false (default). In that case, filter must allow records with filtered columns not present in row. But this performance optimisation have no way to detect such rows, because we first scan CFs added to filters. So, it can miss these rows completely in a result. The solution is quite simple - turn off optimisation when filterIfMissing is false.

          My patch for 0.90.6, could you, please, apply it?

          commit 66b32a09e59fe12bfab55e819336678114269bb8
          Author: Max Lapan <max.lapan@gmail.com>
          Date:   Thu Aug 30 17:22:45 2012 +0400
          
              Disable fast scans when filterIfMissed=false
          
          diff --git a/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java b/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java
          index 105009e..2983a5f 100644
          --- a/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java
          +++ b/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java
          @@ -276,9 +276,14 @@ public class SingleColumnValueFilter extends FilterBase {
           
             /**
              * The only thing this filter need to check row is given column family. So,
          -   * it's the only essential column in whole scan.
          +   * it's the only essential column in whole scan. If filterIfMissing==false,
          +   * all families are essential, because of a possibility to skip valid rows
          +   * without data in filtered CF.
              */
             public boolean isFamilyEssential(byte[] name) {
          -    return Bytes.equals(name, this.columnFamily);
          +    if (!this.filterIfMissing)
          +      return true;
          +    else
          +      return Bytes.equals(name, this.columnFamily);
             }
           }
          
          Show
          Max Lapan added a comment - Hi! The patch misses one small fix I made this summer (foregot to post it here, sorry). It is trivial in code, but a little tricky in logic. The problem is in SingleColumnValueFilter with filterIfMissing=false (default). In that case, filter must allow records with filtered columns not present in row. But this performance optimisation have no way to detect such rows, because we first scan CFs added to filters. So, it can miss these rows completely in a result. The solution is quite simple - turn off optimisation when filterIfMissing is false. My patch for 0.90.6, could you, please, apply it? commit 66b32a09e59fe12bfab55e819336678114269bb8 Author: Max Lapan <max.lapan@gmail.com> Date: Thu Aug 30 17:22:45 2012 +0400 Disable fast scans when filterIfMissed= false diff --git a/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java b/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java index 105009e..2983a5f 100644 --- a/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java +++ b/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java @@ -276,9 +276,14 @@ public class SingleColumnValueFilter extends FilterBase { /** * The only thing this filter need to check row is given column family. So, - * it's the only essential column in whole scan. + * it's the only essential column in whole scan. If filterIfMissing== false , + * all families are essential, because of a possibility to skip valid rows + * without data in filtered CF. */ public boolean isFamilyEssential( byte [] name) { - return Bytes.equals(name, this .columnFamily); + if (! this .filterIfMissing) + return true ; + else + return Bytes.equals(name, this .columnFamily); } }
          Hide
          Sergey Shelukhin added a comment -

          Hi, I will incorporate that.

          Show
          Sergey Shelukhin added a comment - Hi, I will incorporate that.
          Hide
          Sergey Shelukhin added a comment -

          Finding from integration test - if scan is running concurrently with splits and with more than 2 column families, column families can be lost.
          I run the same test with table pre-split into 20 regions, and it passes all the time.
          Run it again with table pre-split into 5 region, causing splits and it fails when it can only find 2 column families out of 3 for one row or another.
          Max: is this part of acceptable inconsistency that was discussed above? I understand this can be a valid scenario (speedup is huge at the cost of very few (or none if pre-split) easily recoverable errors), but just wonder if you are aware of that...

          Show
          Sergey Shelukhin added a comment - Finding from integration test - if scan is running concurrently with splits and with more than 2 column families, column families can be lost. I run the same test with table pre-split into 20 regions, and it passes all the time. Run it again with table pre-split into 5 region, causing splits and it fails when it can only find 2 column families out of 3 for one row or another. Max: is this part of acceptable inconsistency that was discussed above? I understand this can be a valid scenario (speedup is huge at the cost of very few (or none if pre-split) easily recoverable errors), but just wonder if you are aware of that...
          Hide
          Sergey Shelukhin added a comment -

          Integration test attached to HBASE-7383

          Show
          Sergey Shelukhin added a comment - Integration test attached to HBASE-7383
          Hide
          Sergey Shelukhin added a comment -

          Patch v10: expanded comment, incorporated latest suggestion.

          Show
          Sergey Shelukhin added a comment - Patch v10: expanded comment, incorporated latest suggestion.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12561614/HBASE-5416-v10.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 28 new Findbugs (version 1.3.9) warnings.

          -1 release audit. The applied patch generated 232 release audit warnings (more than the trunk's current 84 warnings).

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor
          org.apache.hadoop.hbase.TestDrainingServer

          -1 core zombie tests. There are zombie tests. See build logs for details.

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//testReport/
          Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561614/HBASE-5416-v10.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 8 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 28 new Findbugs (version 1.3.9) warnings. -1 release audit . The applied patch generated 232 release audit warnings (more than the trunk's current 84 warnings). -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor org.apache.hadoop.hbase.TestDrainingServer -1 core zombie tests . There are zombie tests. See build logs for details. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3601//console This message is automatically generated.
          Hide
          Max Lapan added a comment -

          > is this part of acceptable inconsistency that was discussed above? I understand this can be a valid scenario (speedup is huge at the cost of very few (or none if pre-split) easily recoverable errors), but just wonder if you are aware of that...

          No. It is very strange, because spliting process is handled on lower level of Store class and should be transparent to HRegion level (at least in 0.90.6 codebase, maybe something changed dramatically). In our production split is quite common operation, run without issues. We had one problem (HBASE-6499), which was caused by no one calling seek/reseek frequently, it may be similiar issue.

          Show
          Max Lapan added a comment - > is this part of acceptable inconsistency that was discussed above? I understand this can be a valid scenario (speedup is huge at the cost of very few (or none if pre-split) easily recoverable errors), but just wonder if you are aware of that... No. It is very strange, because spliting process is handled on lower level of Store class and should be transparent to HRegion level (at least in 0.90.6 codebase, maybe something changed dramatically). In our production split is quite common operation, run without issues. We had one problem ( HBASE-6499 ), which was caused by no one calling seek/reseek frequently, it may be similiar issue.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Some initial comments

          getAllowLazyCfLoadingRaw(), getAllowLazyCfLoading()
          

          It can be renamed better?

          public enum LazyCfLoadingValue 
          

          Enum constants can be in CAPS.

          Show
          ramkrishna.s.vasudevan added a comment - Some initial comments getAllowLazyCfLoadingRaw(), getAllowLazyCfLoading() It can be renamed better? public enum LazyCfLoadingValue Enum constants can be in CAPS.
          Hide
          Sergey Shelukhin added a comment -

          Max Lapan I actually had to change (re)seek during rebase, can you take a look if there's some error there? Other than that, I attached the test to linked JIRA, I can repro it consistently by changing the initial regions-per-server constant in the test to low value (1-5).

          Show
          Sergey Shelukhin added a comment - Max Lapan I actually had to change (re)seek during rebase, can you take a look if there's some error there? Other than that, I attached the test to linked JIRA, I can repro it consistently by changing the initial regions-per-server constant in the test to low value (1-5).
          Hide
          Sergey Shelukhin added a comment -
          Show
          Sergey Shelukhin added a comment - ramkrishna.s.vasudevan Thanks!
          Hide
          Jimmy Xiang added a comment -

          I have not looked into the change to the critical core part. For the pb related, I think we don't need the LazyCfLoadingValue enum. If not set, it means
          disabled.

          It will be helpful to put the patch on the RB.

          Show
          Jimmy Xiang added a comment - I have not looked into the change to the critical core part. For the pb related, I think we don't need the LazyCfLoadingValue enum. If not set, it means disabled. It will be helpful to put the patch on the RB.
          Hide
          stack added a comment -

          I do not see enough by way of tests yet to allay strong concerns raised above about intrusiveness of this patch. There is a bunch of new code in here in HRegion scanning. It is not all excluded if new feature flag is not set. I've not reviewed the changes in here closely (it is not a critical nor a blocker issue so is secondary to my thinking). I do not see evidence of close review by others. Has it been done?

          This changes Filter Interface so 0.96 only (as said above).

          + * This can deliver huge perf gains when there's a cf with lots of data; however, it can
          + * also lead to some inconsistent results (e.g. due to concurrent updates, or splits).

          Can we have more detail on what the inconsistency referred to above is about?

          What is happening in SingleColumnValueExcludeFilter? We are removing filterKeyValue and putting in place filterRow and hasFilterRow?

          Should filterBase do return filter.isFamilyEssential(name); rather than just return true in isEssentialFamily.

          Why is below in Region and not in RegionScanner?

          + // Heap of key-values that are not essential for the provided filters and are thus read
          + // on demand, if lazy column family loading is enabled.
          + KeyValueHeap joinedHeap = null;

          This is a little obscene:

          + Collections.sort(results, comparator);

          inside in HRegion merging results of 'essential' and 'non-essential' data (this probably should be rephrased...). Can't be avoided though given what is going on here.

          Show
          stack added a comment - I do not see enough by way of tests yet to allay strong concerns raised above about intrusiveness of this patch. There is a bunch of new code in here in HRegion scanning. It is not all excluded if new feature flag is not set. I've not reviewed the changes in here closely (it is not a critical nor a blocker issue so is secondary to my thinking). I do not see evidence of close review by others. Has it been done? This changes Filter Interface so 0.96 only (as said above). + * This can deliver huge perf gains when there's a cf with lots of data; however, it can + * also lead to some inconsistent results (e.g. due to concurrent updates, or splits). Can we have more detail on what the inconsistency referred to above is about? What is happening in SingleColumnValueExcludeFilter? We are removing filterKeyValue and putting in place filterRow and hasFilterRow? Should filterBase do return filter.isFamilyEssential(name); rather than just return true in isEssentialFamily. Why is below in Region and not in RegionScanner? + // Heap of key-values that are not essential for the provided filters and are thus read + // on demand, if lazy column family loading is enabled. + KeyValueHeap joinedHeap = null; This is a little obscene: + Collections.sort(results, comparator); inside in HRegion merging results of 'essential' and 'non-essential' data (this probably should be rephrased...). Can't be avoided though given what is going on here.
          Hide
          Sergey Shelukhin added a comment -

          stack wrt tests, aside from the tests in the patch, have you seen HBASE-7383? It has an integration test.

          Show
          Sergey Shelukhin added a comment - stack wrt tests, aside from the tests in the patch, have you seen HBASE-7383 ? It has an integration test.
          Hide
          stack added a comment -

          Sergey Shelukhin An IT was mentioned in the above comments. That sounds great (sounds like it found interesting issues around splitting – good stuff). Anyone that you know of done close review (I see reviews that address formatting and grammar but little on the machinery – maybe its because the details are w/o blemish?)? Any other unit tests we could get in there that poke at possible issues particularly regards changes made in default path?

          Show
          stack added a comment - Sergey Shelukhin An IT was mentioned in the above comments. That sounds great (sounds like it found interesting issues around splitting – good stuff). Anyone that you know of done close review (I see reviews that address formatting and grammar but little on the machinery – maybe its because the details are w/o blemish?)? Any other unit tests we could get in there that poke at possible issues particularly regards changes made in default path?
          Hide
          Sergey Shelukhin added a comment -

          I thought ramkrishna.s.vasudevan was going to continue reviewing. In general it has no full +1 at this time.
          I will address minor feedback since last patch, and put the patch in review tool.

          Show
          Sergey Shelukhin added a comment - I thought ramkrishna.s.vasudevan was going to continue reviewing. In general it has no full +1 at this time. I will address minor feedback since last patch, and put the patch in review tool.
          Hide
          Sergey Shelukhin added a comment -

          Jimmy Xiang The enum is tri-value in order to allow per-request override; e.g. there's cluster/table default, but you can override it for each scan.
          Do you want to remove that? In that case, given that feature is for specific types of tables and thus override-on is more likely than override-off, I'd prefer to make false/absent do cluster setting (and default for cluster setting would be off).

          Show
          Sergey Shelukhin added a comment - Jimmy Xiang The enum is tri-value in order to allow per-request override; e.g. there's cluster/table default, but you can override it for each scan. Do you want to remove that? In that case, given that feature is for specific types of tables and thus override-on is more likely than override-off, I'd prefer to make false/absent do cluster setting (and default for cluster setting would be off).
          Hide
          Jimmy Xiang added a comment -

          With PB, there is a method for optional parameter to check if it is set: hasXXX(). Does this help?

          Show
          Jimmy Xiang added a comment - With PB, there is a method for optional parameter to check if it is set: hasXXX(). Does this help?
          Hide
          Jimmy Xiang added a comment -

          If you make the type a Boolean object instead of primitive, then you use null to indicate not_set. Not sure if this is better.

          Show
          Jimmy Xiang added a comment - If you make the type a Boolean object instead of primitive, then you use null to indicate not_set. Not sure if this is better.
          Hide
          Ted Yu added a comment -

          Why is below in Region and not in RegionScanner?

          The joinedHeap is added to RegionScannerImpl, along side storeHeap.

          Show
          Ted Yu added a comment - Why is below in Region and not in RegionScanner? The joinedHeap is added to RegionScannerImpl, along side storeHeap.
          Hide
          Sergey Shelukhin added a comment -

          What is happening in SingleColumnValueExcludeFilter? We are removing filterKeyValue and putting in place filterRow and hasFilterRow?

          Max commented above:
          "...I resolved this by checking that row is not empty right before filterRow(List) called, but this requires to slightly modify SingleColumnValueExcludeFilter logic - move exclude phase from filterKeyValue method to filterRow(List). The main reason for this is beacuse there is no way to distinguish at RegionScanner::nextInternal level empty row which is empty because of filter accepts row, but excludes all it's KVs and row which is empty due to filter rejects"

          Should filterBase do return filter.isFamilyEssential(name); rather than just return true in isEssentialFamily.

          FilterBase is base class, and Filter::isFamilyEssential is abstract. I guess it's just the default behavior for most filters.

          Why is below in Region and not in RegionScanner?

          It is in fact in RegionScanner, together with StoreHeap:

           class RegionScannerImpl implements RegionScanner {
              // Package local for testability
              KeyValueHeap storeHeap = null;
              // Heap of key-values that are not essential for the provided filters and are thus read
              // on demand, if lazy column family loading is enabled.
              KeyValueHeap joinedHeap = null;
          

          This is a little obscene:

          + Collections.sort(results, comparator);

          inside in HRegion merging results of 'essential' and 'non-essential' data (this probably should be rephrased...). Can't be avoided though given what is going on here.

          That is actually an interesting point, is there anything that prevents us from only sorting results at the end, not for each row?

          Show
          Sergey Shelukhin added a comment - What is happening in SingleColumnValueExcludeFilter? We are removing filterKeyValue and putting in place filterRow and hasFilterRow? Max commented above: "...I resolved this by checking that row is not empty right before filterRow(List) called, but this requires to slightly modify SingleColumnValueExcludeFilter logic - move exclude phase from filterKeyValue method to filterRow(List). The main reason for this is beacuse there is no way to distinguish at RegionScanner::nextInternal level empty row which is empty because of filter accepts row, but excludes all it's KVs and row which is empty due to filter rejects" Should filterBase do return filter.isFamilyEssential(name); rather than just return true in isEssentialFamily. FilterBase is base class, and Filter::isFamilyEssential is abstract. I guess it's just the default behavior for most filters. Why is below in Region and not in RegionScanner? It is in fact in RegionScanner, together with StoreHeap: class RegionScannerImpl implements RegionScanner { // Package local for testability KeyValueHeap storeHeap = null ; // Heap of key-values that are not essential for the provided filters and are thus read // on demand, if lazy column family loading is enabled. KeyValueHeap joinedHeap = null ; This is a little obscene: + Collections.sort(results, comparator); inside in HRegion merging results of 'essential' and 'non-essential' data (this probably should be rephrased...). Can't be avoided though given what is going on here. That is actually an interesting point, is there anything that prevents us from only sorting results at the end, not for each row?
          Hide
          Sergey Shelukhin added a comment -

          As in, for each next call on the heap.

          Show
          Sergey Shelukhin added a comment - As in, for each next call on the heap.
          Hide
          Sergey Shelukhin added a comment -

          Review on https://reviews.apache.org/r/8689/. Changed the value in Scan to Boolean due to popular demand, renamed the settings, other various feedback.

          Show
          Sergey Shelukhin added a comment - Review on https://reviews.apache.org/r/8689/ . Changed the value in Scan to Boolean due to popular demand, renamed the settings, other various feedback.
          Hide
          Sergey Shelukhin added a comment -

          stack the only change on the main path not conditional on joinedScanners/joinedHeap/etc. being there seems to be refactoring the while loop under

          } else if (filterRowKey(currentRow, offset, length)) {
          

          into populateResult, which in this case will do one extra matchingRow check of storeHeap current row against itself (it pre-checks current heap KV instead of post-checking, but in this case the row we use to check was just created from this very heap). I don't think test could be that targeted even though when refactoring there's potential to add bugs...

          Show
          Sergey Shelukhin added a comment - stack the only change on the main path not conditional on joinedScanners/joinedHeap/etc. being there seems to be refactoring the while loop under } else if (filterRowKey(currentRow, offset, length)) { into populateResult, which in this case will do one extra matchingRow check of storeHeap current row against itself (it pre-checks current heap KV instead of post-checking, but in this case the row we use to check was just created from this very heap). I don't think test could be that targeted even though when refactoring there's potential to add bugs...
          Hide
          stack added a comment -

          Thanks for feedback.

          Use case is valid. So are concerns that we are tinkering w/ a critical path raised by more than one dev. What do you think lads we could do to put folks at ease with this patch. The integration testing helps big time. Keeping all the new code optionally enabled helps too. Any chance of more careful reviews? That would help as well. Can we think of any other unit tests we could add?

          Show
          stack added a comment - Thanks for feedback. Use case is valid. So are concerns that we are tinkering w/ a critical path raised by more than one dev. What do you think lads we could do to put folks at ease with this patch. The integration testing helps big time. Keeping all the new code optionally enabled helps too. Any chance of more careful reviews? That would help as well. Can we think of any other unit tests we could add?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12561806/HBASE-5416-v11.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 28 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestMultiParallel
          org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor
          org.apache.hadoop.hbase.client.TestFromClientSide
          org.apache.hadoop.hbase.coprocessor.TestAggregateProtocol
          org.apache.hadoop.hbase.catalog.TestMetaReaderEditor

          -1 core zombie tests. There are zombie tests. See build logs for details.

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561806/HBASE-5416-v11.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 8 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 28 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.client.TestMultiParallel org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor org.apache.hadoop.hbase.client.TestFromClientSide org.apache.hadoop.hbase.coprocessor.TestAggregateProtocol org.apache.hadoop.hbase.catalog.TestMetaReaderEditor -1 core zombie tests . There are zombie tests. See build logs for details. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3622//console This message is automatically generated.
          Hide
          Sergey Shelukhin added a comment -

          Hmm, I will look at all these tests

          Show
          Sergey Shelukhin added a comment - Hmm, I will look at all these tests
          Hide
          Sergey Shelukhin added a comment -

          Appear to pass on local

          Show
          Sergey Shelukhin added a comment - Appear to pass on local
          Hide
          Anoop Sam John added a comment -

          Sergey Shelukhin
          The issue which coming while split may be a concern..
          This is only with this new patch right?
          I will have a look at the patch and the flow and will see whether can help

          Show
          Anoop Sam John added a comment - Sergey Shelukhin The issue which coming while split may be a concern.. This is only with this new patch right? I will have a look at the patch and the flow and will see whether can help
          Hide
          ramkrishna.s.vasudevan added a comment -

          I will continue to look at the patch.. I have looked into half of the things.
          The testcase failures could be a concern. Will continue the review today.

          Show
          ramkrishna.s.vasudevan added a comment - I will continue to look at the patch.. I have looked into half of the things. The testcase failures could be a concern. Will continue the review today.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Ok groked up the patch.

            if (!scan.getAllowLazyCfLoading()
                    || this.filter == null || this.filter.isFamilyEssential(entry.getKey())) {
          

          Move the this.filter == null as first condition. Because when you don have filters then the entire joinedHeap is not going to used right?

           correct_row = this.joinedHeap.seek(KeyValue.createFirstOnRow(currentRow, offset, length));
          

          So here we move on to the KV just before the row we got in the current next() call?
          After this suppose due to limits it says that joinedHeapHasMoreData =true, now when the next call comes

          else if (joinedHeapHasMoreData) {
                    joinedHeapHasMoreData =
                      populateResult(this.joinedHeap, limit, currentRow, offset, length, metric);
                    return true;
          

          I think we should get the return val from the populateResult and if it returns a false we may need to check if we have reached the stopRow or not right?
          Filters need not be checked anyway.

          So one thing is if i say in my Scan that i need LazyLoading but my filter is NOT of the type SCVF and the ones that implement isFamilyEssential then it goes thro normal flow. May be this we need to document clearly as user may think that setting that property is going to give him a better optimized scan.

          Reg, the TestHRegion testcases. Actually the testcases does not test the behaviour of joinedScanners. Is it intended? But the testcase names suggests it tests joinedScanners.
          I will leave it to other scan experts in deciding whether this can go in. Overall a very good improvment.

          Thanks to Max, Sergey and Ted.

          Show
          ramkrishna.s.vasudevan added a comment - Ok groked up the patch. if (!scan.getAllowLazyCfLoading() || this .filter == null || this .filter.isFamilyEssential(entry.getKey())) { Move the this.filter == null as first condition. Because when you don have filters then the entire joinedHeap is not going to used right? correct_row = this .joinedHeap.seek(KeyValue.createFirstOnRow(currentRow, offset, length)); So here we move on to the KV just before the row we got in the current next() call? After this suppose due to limits it says that joinedHeapHasMoreData =true, now when the next call comes else if (joinedHeapHasMoreData) { joinedHeapHasMoreData = populateResult( this .joinedHeap, limit, currentRow, offset, length, metric); return true ; I think we should get the return val from the populateResult and if it returns a false we may need to check if we have reached the stopRow or not right? Filters need not be checked anyway. So one thing is if i say in my Scan that i need LazyLoading but my filter is NOT of the type SCVF and the ones that implement isFamilyEssential then it goes thro normal flow. May be this we need to document clearly as user may think that setting that property is going to give him a better optimized scan. Reg, the TestHRegion testcases. Actually the testcases does not test the behaviour of joinedScanners. Is it intended? But the testcase names suggests it tests joinedScanners. I will leave it to other scan experts in deciding whether this can go in. Overall a very good improvment. Thanks to Max, Sergey and Ted.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Sergey
          The split related failure has to be investigated. Will try looking into the possible reason for failure.

          Show
          ramkrishna.s.vasudevan added a comment - @Sergey The split related failure has to be investigated. Will try looking into the possible reason for failure.
          Hide
          Sergey Shelukhin added a comment -

          Hmm... I am trying to clean up code a bit now and write comments.
          It seems that this patch shouldn't work with limits at all... in the big else clause, if we get false from populateResult on storeHeap, we'd go on to start getting stuff from joinedMap. Suppose stopRow is true e.g. storeHeap.peek() now points at the stop-row.
          Suppose now we hit the limit and set joinedHeapHasMoreData, and return true.
          On the next call, storeHeap is still pointing to stop-row, so we won't even reach else if (joinedHeapHasMoreData) condition (well, and if we did we'd populate nothing because matchingRow will always return false).

          Can someone please sanity check me?
          I'll see how to fix it.

          Show
          Sergey Shelukhin added a comment - Hmm... I am trying to clean up code a bit now and write comments. It seems that this patch shouldn't work with limits at all... in the big else clause, if we get false from populateResult on storeHeap, we'd go on to start getting stuff from joinedMap. Suppose stopRow is true e.g. storeHeap.peek() now points at the stop-row. Suppose now we hit the limit and set joinedHeapHasMoreData, and return true. On the next call, storeHeap is still pointing to stop-row, so we won't even reach else if (joinedHeapHasMoreData) condition (well, and if we did we'd populate nothing because matchingRow will always return false). Can someone please sanity check me? I'll see how to fix it.
          Hide
          Sergey Shelukhin added a comment -

          Cleaned up the method and added a bunch of comments so that it'd be clear what is going on. Fixed the above issues.
          I will look at integration test issue next.

          Show
          Sergey Shelukhin added a comment - Cleaned up the method and added a bunch of comments so that it'd be clear what is going on. Fixed the above issues. I will look at integration test issue next.
          Hide
          Sergey Shelukhin added a comment -

          It appears that I have accidentally fixed something. I can repro with v11 patch but not v12...

          Show
          Sergey Shelukhin added a comment - It appears that I have accidentally fixed something. I can repro with v11 patch but not v12...
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12562021/HBASE-5416-v12.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 28 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestMultiParallel
          org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort
          org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor
          org.apache.hadoop.hbase.client.TestFromClientSide
          org.apache.hadoop.hbase.coprocessor.TestAggregateProtocol

          -1 core zombie tests. There are zombie tests. See build logs for details.

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562021/HBASE-5416-v12.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 8 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 28 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.client.TestMultiParallel org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor org.apache.hadoop.hbase.client.TestFromClientSide org.apache.hadoop.hbase.coprocessor.TestAggregateProtocol -1 core zombie tests . There are zombie tests. See build logs for details. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3641//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12562033/HBASE-5416-v12.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 28 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.replication.TestReplication
          org.apache.hadoop.hbase.util.TestMergeTable

          -1 core zombie tests. There are zombie tests. See build logs for details.

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562033/HBASE-5416-v12.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 8 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 28 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.util.TestMergeTable -1 core zombie tests . There are zombie tests. See build logs for details. Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3644//console This message is automatically generated.
          Hide
          ramkrishna.s.vasudevan added a comment -

          I will take a look at the updated patch by tomorrow evening.

          Show
          ramkrishna.s.vasudevan added a comment - I will take a look at the updated patch by tomorrow evening.
          Hide
          stack added a comment -

          It appears that I have accidentally fixed something. I can repro with v11 patch but not v12...

          Smile

          Show
          stack added a comment - It appears that I have accidentally fixed something. I can repro with v11 patch but not v12... Smile
          Hide
          Ted Yu added a comment -

          Looks like ClientProtos.java needs to be regenerated due to recent changes in trunk.

          Show
          Ted Yu added a comment - Looks like ClientProtos.java needs to be regenerated due to recent changes in trunk.
          Hide
          Sergey Shelukhin added a comment -

          I replaced v12 patch to do that yesterday evening. Or you mean again? I will regen at next iteration, just in case

          Show
          Sergey Shelukhin added a comment - I replaced v12 patch to do that yesterday evening. Or you mean again? I will regen at next iteration, just in case
          Hide
          Ted Yu added a comment -

          I think the rewritten logic is easier to understand.

          +  public static final String LOAD_CFS_ON_DEMAND_CONFIG_KEY = "hbase.hregion.scan.loadColumnFamiliesOnDemand";
          

          Wrap long line.

          +   public boolean isLoadingCfsOnDemandDefault() {
          

          Can the 'Default' be dropped from the method name ? We're interested in whether on demand loading is on.

          +      List<KeyValueScanner> joinedScanners = new ArrayList<KeyValueScanner>();
          

          Should we check scan.doLoadColumnFamiliesOnDemand() first so that we don't allocate ArrayList if this feature is turned off ?

          +     * Fetches records with this row into result list, until next row or limit (if not -1).
          

          'this row' -> 'currentRow'
          'result list' -> 'results list'

          +        // Check if we were getting data from the joinedHeap abd hit the limit.
          

          'abd' -> 'and'

          +          // Techically, if we hit limits before on this row, we don't need this call.
          

          Typo: Techically

          +          // Populating from the joined map was stopped by limits, populate some more.
          

          'joined map' -> 'joined heap'

          +        // the case when SingleValueExcludeFilter is used.
          

          SingleValueExcludeFilter -> SingleColumnValueExcludeFilter

          Show
          Ted Yu added a comment - I think the rewritten logic is easier to understand. + public static final String LOAD_CFS_ON_DEMAND_CONFIG_KEY = "hbase.hregion.scan.loadColumnFamiliesOnDemand" ; Wrap long line. + public boolean isLoadingCfsOnDemandDefault() { Can the 'Default' be dropped from the method name ? We're interested in whether on demand loading is on. + List<KeyValueScanner> joinedScanners = new ArrayList<KeyValueScanner>(); Should we check scan.doLoadColumnFamiliesOnDemand() first so that we don't allocate ArrayList if this feature is turned off ? + * Fetches records with this row into result list, until next row or limit ( if not -1). 'this row' -> 'currentRow' 'result list' -> 'results list' + // Check if we were getting data from the joinedHeap abd hit the limit. 'abd' -> 'and' + // Techically, if we hit limits before on this row, we don't need this call. Typo: Techically + // Populating from the joined map was stopped by limits, populate some more. 'joined map' -> 'joined heap' + // the case when SingleValueExcludeFilter is used. SingleValueExcludeFilter -> SingleColumnValueExcludeFilter
          Hide
          ramkrishna.s.vasudevan added a comment -

          I could find one thing here

                  // Let's see what we have in the storeHeap.
                  KeyValue current = this.storeHeap.peek();
                  boolean stopRow = isStopRow(current);
          

          Assume i have one CF and i have a filter with isFamilyEssential = true. Also scan says lazy loading of CF is allowed.
          Now in this case i will have only the joinedScanner heap.
          So now when the next() is called
          current will be null as nothing is there in storeHeap.
          Inside isStopRow

            return kv == null || kv.getBuffer() == null ||
                    (stopRow != null &&
                    comparator.compareRows(stopRow, 0, stopRow.length,
                        kv.getBuffer(), kv.getRowOffset(), kv.getRowLength()) <= isScan);
          

          This will give me true. So stopRow is true.

          if (stopRow) {
                      if (filter != null && filter.hasFilterRow()) {
                        filter.filterRow(results);
                      }
                      return false;
                    }
          

          So this code will just return without scanning. Pls correct me if am wrong?

          Show
          ramkrishna.s.vasudevan added a comment - I could find one thing here // Let's see what we have in the storeHeap. KeyValue current = this .storeHeap.peek(); boolean stopRow = isStopRow(current); Assume i have one CF and i have a filter with isFamilyEssential = true. Also scan says lazy loading of CF is allowed. Now in this case i will have only the joinedScanner heap. So now when the next() is called current will be null as nothing is there in storeHeap. Inside isStopRow return kv == null || kv.getBuffer() == null || (stopRow != null && comparator.compareRows(stopRow, 0, stopRow.length, kv.getBuffer(), kv.getRowOffset(), kv.getRowLength()) <= isScan); This will give me true. So stopRow is true. if (stopRow) { if (filter != null && filter.hasFilterRow()) { filter.filterRow(results); } return false ; } So this code will just return without scanning. Pls correct me if am wrong?
          Hide
          Ted Yu added a comment -

          Now in this case i will have only the joinedScanner heap.

          I think you meant we only have storeHeap.

          Here is related code from HRegion.nextInternal() of 0.94 branch:

                  if (isStopRow(currentRow, offset, length)) {
                    if (filter != null && filter.hasFilterRow()) {
                      filter.filterRow(results);
                    }
                    if (filter != null && filter.filterRow()) {
                      results.clear();
                    }
          
                    return false;
          

          We can see the logic is the same as that in patch v12.

          Show
          Ted Yu added a comment - Now in this case i will have only the joinedScanner heap. I think you meant we only have storeHeap. Here is related code from HRegion.nextInternal() of 0.94 branch: if (isStopRow(currentRow, offset, length)) { if (filter != null && filter.hasFilterRow()) { filter.filterRow(results); } if (filter != null && filter.filterRow()) { results.clear(); } return false ; We can see the logic is the same as that in patch v12.
          Hide
          Ted Yu added a comment -

          Patch v13 rebases on trunk.
          TestSingleColumnValueExcludeFilter, TestHRegion and TestJoinedScanners pass.

          Show
          Ted Yu added a comment - Patch v13 rebases on trunk. TestSingleColumnValueExcludeFilter, TestHRegion and TestJoinedScanners pass.
          Hide
          Max Lapan added a comment -

          I think you meant we only have storeHeap.

          No, exactly one KVS in joinedScanner heap and empty storeHeap. It was caused by !scan.doLoadColumnFamiliesOnDemand() extra condition in constructor.

          Show
          Max Lapan added a comment - I think you meant we only have storeHeap. No, exactly one KVS in joinedScanner heap and empty storeHeap. It was caused by !scan.doLoadColumnFamiliesOnDemand() extra condition in constructor.
          Hide
          ramkrishna.s.vasudevan added a comment -

          Yes Max that's what i meant.
          Ted, just take the condition where isFamilyEssential = true. Also scan says lazy loading of CF is allowed. (and also only one CF).
          So storeHeap will be null in this case.

          Show
          ramkrishna.s.vasudevan added a comment - Yes Max that's what i meant. Ted, just take the condition where isFamilyEssential = true. Also scan says lazy loading of CF is allowed. (and also only one CF). So storeHeap will be null in this case.
          Hide
          Ted Yu added a comment -

          Here is the change related to Max's comment above:

          +        if (this.filter == null || !scan.doLoadColumnFamiliesOnDemand()
          +          || this.filter.isFamilyEssential(entry.getKey())) {
          +          scanners.add(scanner);
          

          Ram said:

          Also scan says lazy loading of CF is allowed.

          So the condition !scan.doLoadColumnFamiliesOnDemand() should be false, right ?

          Show
          Ted Yu added a comment - Here is the change related to Max's comment above: + if ( this .filter == null || !scan.doLoadColumnFamiliesOnDemand() + || this .filter.isFamilyEssential(entry.getKey())) { + scanners.add(scanner); Ram said: Also scan says lazy loading of CF is allowed. So the condition !scan.doLoadColumnFamiliesOnDemand() should be false, right ?
          Hide
          Ted Yu added a comment -

          Here is code from patch v7:

          +        if (this.filter == null || this.filter.isFamilyEssential(entry.getKey())) {
          +          scanners.add(scanner);
          

          In the above case, I don't see difference between v7 and v12 w.r.t. populating scanners List.

          Show
          Ted Yu added a comment - Here is code from patch v7: + if ( this .filter == null || this .filter.isFamilyEssential(entry.getKey())) { + scanners.add(scanner); In the above case, I don't see difference between v7 and v12 w.r.t. populating scanners List.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12562493/5416-v13.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 2 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces lines longer than 100

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
          org.apache.hadoop.hbase.replication.TestReplication

          -1 core zombie tests. There are 4 zombie test(s):

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562493/5416-v13.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 2 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces lines longer than 100 -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster org.apache.hadoop.hbase.replication.TestReplication -1 core zombie tests . There are 4 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3718//console This message is automatically generated.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Ted
          You are correct. My bad. So if

           if (this.filter == null || !scan.doLoadColumnFamiliesOnDemand()
          +          || this.filter.isFamilyEssential(entry.getKey())) {
          

          So even if first two condition fail if isFamilyEssential= true then it will add to scanners.
          But still if isFamilyEssentail = false then what will happen wrt the stopRow that is added in the latest patches. What you feel?
          Because the current is always got from the storeHeap.

          Show
          ramkrishna.s.vasudevan added a comment - @Ted You are correct. My bad. So if if ( this .filter == null || !scan.doLoadColumnFamiliesOnDemand() + || this .filter.isFamilyEssential(entry.getKey())) { So even if first two condition fail if isFamilyEssential= true then it will add to scanners. But still if isFamilyEssentail = false then what will happen wrt the stopRow that is added in the latest patches. What you feel? Because the current is always got from the storeHeap.
          Hide
          Ted Yu added a comment -

          I need to go over the logic one more time. But for non-essential column family, I think it is correct too.

          Show
          Ted Yu added a comment - I need to go over the logic one more time. But for non-essential column family, I think it is correct too.
          Hide
          Lars Hofhansl added a comment -

          I will try to make a 0.94 patch and do some performance testing, if that turns out well, let's pull this into 0.94 as well.

          Show
          Lars Hofhansl added a comment - I will try to make a 0.94 patch and do some performance testing, if that turns out well, let's pull this into 0.94 as well.
          Hide
          Lars Hofhansl added a comment -

          v13 is undoing some of the optimization I had put it. HRegion.RegionScannerImpl.isStopRow should continue to take a byte[], offset, and length, rather than a KeyValue.

          Show
          Lars Hofhansl added a comment - v13 is undoing some of the optimization I had put it. HRegion.RegionScannerImpl.isStopRow should continue to take a byte[], offset, and length, rather than a KeyValue.
          Hide
          Lars Hofhansl added a comment - - edited

          Even when I fix that, it does slow down the tight loop case by about 10-20%.
          (That is a scan on a single column with a filter that happens to filter everything, and everything is in the blockcache)

          Without this patch I can scan 20m single column rows in ~5.8s, which this patch it's ~6.9s.
          This is with onDemand loading disabled.

          Show
          Lars Hofhansl added a comment - - edited Even when I fix that, it does slow down the tight loop case by about 10-20%. (That is a scan on a single column with a filter that happens to filter everything, and everything is in the blockcache) Without this patch I can scan 20m single column rows in ~5.8s, which this patch it's ~6.9s. This is with onDemand loading disabled.
          Hide
          Lars Hofhansl added a comment -

          This is probably due to the extra calls to KeyValueHeap.peek().

          Show
          Lars Hofhansl added a comment - This is probably due to the extra calls to KeyValueHeap.peek().
          Hide
          Lars Hofhansl added a comment -

          If the while loop in populateResults is change back to the original do/while loop the results are closer to what it was before. I understand why you changed it to the while loop (it looks better), but it does unnecessary peeks into the heap.

          Show
          Lars Hofhansl added a comment - If the while loop in populateResults is change back to the original do/while loop the results are closer to what it was before. I understand why you changed it to the while loop (it looks better), but it does unnecessary peeks into the heap.
          Hide
          Lars Hofhansl added a comment -

          It's still 5.8s vs 6.2s (without vs with patch).

          Show
          Lars Hofhansl added a comment - It's still 5.8s vs 6.2s (without vs with patch).
          Hide
          Lars Hofhansl added a comment -

          Preliminary 0.94 patch.
          The scan flag is communicated via the scan attributes (so that the wire protocol does not have to change).
          I ran the two included/changed tests and did the before-mentioned performance tests.

          Show
          Lars Hofhansl added a comment - Preliminary 0.94 patch. The scan flag is communicated via the scan attributes (so that the wire protocol does not have to change). I ran the two included/changed tests and did the before-mentioned performance tests.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12562737/5416-0.94-v1.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3772//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562737/5416-0.94-v1.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3772//console This message is automatically generated.
          Hide
          Lars Hofhansl added a comment -

          Here's another experiment I did against the 0.94 patch:
          Two column families, a SingleColumnValueFilter that filters everything based on the 1st CF.
          When I run this against 2 CFs (with on demand disabled) it takes about 60s.
          When I run this a single CF it takes ~30s.
          When I enable on demand loading and run against both CFs it still takes 60s. I would have expected that that would have been closer to 30s.

          Show
          Lars Hofhansl added a comment - Here's another experiment I did against the 0.94 patch: Two column families, a SingleColumnValueFilter that filters everything based on the 1st CF. When I run this against 2 CFs (with on demand disabled) it takes about 60s. When I run this a single CF it takes ~30s. When I enable on demand loading and run against both CFs it still takes 60s. I would have expected that that would have been closer to 30s.
          Hide
          Ted Yu added a comment -

          Thanks for doing the experiments, Lars.
          Can you take a look at the use case shown in the test in the patch to see if it matches the way you did experiment ?

          Show
          Ted Yu added a comment - Thanks for doing the experiments, Lars. Can you take a look at the use case shown in the test in the patch to see if it matches the way you did experiment ?
          Hide
          Lars Hofhansl added a comment -

          There's something amiss here. Filer.filterRow is not called anywhere in trunk except for some filterwrappers and in tests (so we could just remove it, or there's a more general bug in trunk).
          It seems it is not used. SCVF does not implement filterRow(List<KeyValue) so this entire thing cannot work (I did these checks on the trunk patch).

          I assumed what I tested was the use case: You have multiple column families and use a SCVF based on one of the families, which should then avoid even touching the other CFs.

          Show
          Lars Hofhansl added a comment - There's something amiss here. Filer.filterRow is not called anywhere in trunk except for some filterwrappers and in tests (so we could just remove it, or there's a more general bug in trunk). It seems it is not used. SCVF does not implement filterRow(List<KeyValue) so this entire thing cannot work (I did these checks on the trunk patch). I assumed what I tested was the use case: You have multiple column families and use a SCVF based on one of the families, which should then avoid even touching the other CFs.
          Hide
          Ted Yu added a comment -

          @Lars:
          Here is how SingleColumnValueFilter is used in the test:

          +    SingleColumnValueFilter filter = new SingleColumnValueFilter(
          +        cf_essential, col_name, CompareFilter.CompareOp.EQUAL, flag_yes);
          +    filter.setFilterIfMissing(true);
          

          Here is the reason for setting filterIfMissing to true:

          +  public boolean isFamilyEssential(byte[] name) {
          +    return !this.filterIfMissing || Bytes.equals(name, this.columnFamily);
          

          I think what you experienced was that the default value of false for filterIfMissing was in effect and both column families were examined.

          We should definitely emphasize this intricacy in release notes.

          Show
          Ted Yu added a comment - @Lars: Here is how SingleColumnValueFilter is used in the test: + SingleColumnValueFilter filter = new SingleColumnValueFilter( + cf_essential, col_name, CompareFilter.CompareOp.EQUAL, flag_yes); + filter.setFilterIfMissing( true ); Here is the reason for setting filterIfMissing to true: + public boolean isFamilyEssential( byte [] name) { + return ! this .filterIfMissing || Bytes.equals(name, this .columnFamily); I think what you experienced was that the default value of false for filterIfMissing was in effect and both column families were examined. We should definitely emphasize this intricacy in release notes.
          Hide
          Lars Hofhansl added a comment -

          Thanks Ted. You're right. I'll redo my test in a bit.

          Show
          Lars Hofhansl added a comment - Thanks Ted. You're right. I'll redo my test in a bit.
          Hide
          Lars Hofhansl added a comment -

          Actually. That should make it work with my 0.94 patch. In 0.96 Filter.filterRow() is never called, so this it would never skip any rows that way.

          Show
          Lars Hofhansl added a comment - Actually. That should make it work with my 0.94 patch. In 0.96 Filter.filterRow() is never called, so this it would never skip any rows that way.
          Hide
          Ted Yu added a comment -

          @Lars:
          Pardon me for not explaining in more detail.
          In HRegion.RegionScannerImpl():

                if (scan.hasFilter()) {
                  this.filter = new FilterWrapper(scan.getFilter());
          

          In FilterWrapper:

            public void filterRow(List<KeyValue> kvs) {
              //To fix HBASE-6429, 
              //Filter with filterRow() returning true is incompatible with scan with limit
              //1. hasFilterRow() returns true, if either filterRow() or filterRow(kvs) is implemented.
              //2. filterRow() is merged with filterRow(kvs),
              //so that to make all those row related filtering stuff in the same function.
              this.filter.filterRow(kvs);
              if (!kvs.isEmpty() && this.filter.filterRow()) {
                kvs.clear();
              }
          

          If you have further questions, please let me know.

          Once you confirm that the 0.94 patch works, I will attach patch v14 for trunk which addresses the two issues you raised this afternoon.

          Show
          Ted Yu added a comment - @Lars: Pardon me for not explaining in more detail. In HRegion.RegionScannerImpl(): if (scan.hasFilter()) { this .filter = new FilterWrapper(scan.getFilter()); In FilterWrapper: public void filterRow(List<KeyValue> kvs) { //To fix HBASE-6429, //Filter with filterRow() returning true is incompatible with scan with limit //1. hasFilterRow() returns true , if either filterRow() or filterRow(kvs) is implemented. //2. filterRow() is merged with filterRow(kvs), //so that to make all those row related filtering stuff in the same function. this .filter.filterRow(kvs); if (!kvs.isEmpty() && this .filter.filterRow()) { kvs.clear(); } If you have further questions, please let me know. Once you confirm that the 0.94 patch works, I will attach patch v14 for trunk which addresses the two issues you raised this afternoon.
          Hide
          Lars Hofhansl added a comment -

          Cool... You're right here too. We should probably merge these two in 0.94 as well, but that's another story.
          (It's all getting a bit messy. We have Filter, FilterBase, FilterWrapper. Now we even have logic in FilterWrapper)

          Show
          Lars Hofhansl added a comment - Cool... You're right here too. We should probably merge these two in 0.94 as well, but that's another story. (It's all getting a bit messy. We have Filter, FilterBase, FilterWrapper. Now we even have logic in FilterWrapper)
          Hide
          Lars Hofhansl added a comment -

          I confirmed that when I set filterIfMissing the scan time is very close to 30s.
          Note that I am not giving a +1, yet. This needs a bit more looking at. Especially the slow down the regular path is a bit disconcerting.

          Show
          Lars Hofhansl added a comment - I confirmed that when I set filterIfMissing the scan time is very close to 30s. Note that I am not giving a +1, yet. This needs a bit more looking at. Especially the slow down the regular path is a bit disconcerting.
          Hide
          Ted Yu added a comment -

          Patch v14 addresses the two issues Lars raised.

          Thanks to Lars who verified that this feature is helpful in the described scenario.

          For the regular path, what we can do is to optimize away all the new code based on disabling on demand column family loading. We should be able to spot the difference from existing code more easily.

          Show
          Ted Yu added a comment - Patch v14 addresses the two issues Lars raised. Thanks to Lars who verified that this feature is helpful in the described scenario. For the regular path, what we can do is to optimize away all the new code based on disabling on demand column family loading. We should be able to spot the difference from existing code more easily.
          Hide
          Lars Hofhansl added a comment -

          It might help to inline the populateResults code again. That would save us an extra call to KeyValueHeap.peek().

          Show
          Lars Hofhansl added a comment - It might help to inline the populateResults code again. That would save us an extra call to KeyValueHeap.peek().
          Hide
          Lars Hofhansl added a comment -

          Or populateResult could return the nextKv to indicate that we're not done (null to indicate that a limit was hit)

          Show
          Lars Hofhansl added a comment - Or populateResult could return the nextKv to indicate that we're not done (null to indicate that a limit was hit)
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12562752/5416-v14.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces lines longer than 100

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.replication.TestReplication

          -1 core zombie tests. There are 2 zombie test(s):

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562752/5416-v14.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces lines longer than 100 -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication -1 core zombie tests . There are 2 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3778//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          Patch v15 adopts Lars' suggestion above and saves one heap.peek() call.

          Show
          Ted Yu added a comment - Patch v15 adopts Lars' suggestion above and saves one heap.peek() call.
          Hide
          Lars Hofhansl added a comment -

          Lemme do the same to the 0.94 patch and try my test again.
          (If we want this in 0.94 we have to maintain these two patches separately now, as 0.94 and 0.96 are sufficiently different in this area. Maybe 0.96 is indeed better.)

          Show
          Lars Hofhansl added a comment - Lemme do the same to the 0.94 patch and try my test again. (If we want this in 0.94 we have to maintain these two patches separately now, as 0.94 and 0.96 are sufficiently different in this area. Maybe 0.96 is indeed better.)
          Hide
          Lars Hofhansl added a comment - - edited

          Same changes for the 0.94 patch.
          With this change I cannot detect a performance detriment when disabled (will try with a larger data set too).

          Show
          Lars Hofhansl added a comment - - edited Same changes for the 0.94 patch. With this change I cannot detect a performance detriment when disabled (will try with a larger data set too).
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12562766/5416-0.94-v2.txt
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 6 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3781//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562766/5416-0.94-v2.txt against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 6 new or modified tests. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3781//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12562757/5416-v15.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces lines longer than 100

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestHCM
          org.apache.hadoop.hbase.replication.TestMasterReplication
          org.apache.hadoop.hbase.TestHBaseTestingUtility

          -1 core zombie tests. There are 3 zombie test(s): at org.apache.hadoop.hbase.io.encoding.TestUpgradeFromHFileV1ToEncoding.testUpgrade(TestUpgradeFromHFileV1ToEncoding.java:83)

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562757/5416-v15.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces lines longer than 100 -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.client.TestHCM org.apache.hadoop.hbase.replication.TestMasterReplication org.apache.hadoop.hbase.TestHBaseTestingUtility -1 core zombie tests . There are 3 zombie test(s): at org.apache.hadoop.hbase.io.encoding.TestUpgradeFromHFileV1ToEncoding.testUpgrade(TestUpgradeFromHFileV1ToEncoding.java:83) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3780//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          Patch v16 corrects spelling and wraps long line.

          Show
          Ted Yu added a comment - Patch v16 corrects spelling and wraps long line.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12562796/5416-v16.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces lines longer than 100

          -1 core tests. The patch failed these unit tests:

          -1 core zombie tests. There are 4 zombie test(s): at org.apache.hadoop.hbase.io.encoding.TestUpgradeFromHFileV1ToEncoding.testUpgrade(TestUpgradeFromHFileV1ToEncoding.java:83)

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562796/5416-v16.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces lines longer than 100 -1 core tests . The patch failed these unit tests: -1 core zombie tests . There are 4 zombie test(s): at org.apache.hadoop.hbase.io.encoding.TestUpgradeFromHFileV1ToEncoding.testUpgrade(TestUpgradeFromHFileV1ToEncoding.java:83) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3783//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          From https://builds.apache.org/job/PreCommit-HBASE-Build/3783//console:

          "pool-1-thread-1" prio=10 tid=0x773bc800 nid=0x451 in Object.wait() [0x772fe000]
             java.lang.Thread.State: WAITING (on object monitor)
            at java.lang.Object.wait(Native Method)
            - waiting on &lt;0x803da7e8> (a org.apache.hadoop.hbase.util.JVMClusterUtil$RegionServerThread)
            at java.lang.Thread.join(Thread.java:1186)
            - locked &lt;0x803da7e8> (a org.apache.hadoop.hbase.util.JVMClusterUtil$RegionServerThread)
            at java.lang.Thread.join(Thread.java:1239)
            at org.apache.hadoop.hbase.util.JVMClusterUtil.shutdown(JVMClusterUtil.java:245)
            at org.apache.hadoop.hbase.LocalHBaseCluster.shutdown(LocalHBaseCluster.java:430)
            at org.apache.hadoop.hbase.MiniHBaseCluster.shutdown(MiniHBaseCluster.java:501)
            at org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:856)
            at org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:826)
            at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.after(TestSplitTransactionOnCluster.java:109)
          
          Show
          Ted Yu added a comment - From https://builds.apache.org/job/PreCommit-HBASE-Build/3783//console: "pool-1-thread-1" prio=10 tid=0x773bc800 nid=0x451 in Object .wait() [0x772fe000] java.lang. Thread .State: WAITING (on object monitor) at java.lang. Object .wait(Native Method) - waiting on &lt;0x803da7e8> (a org.apache.hadoop.hbase.util.JVMClusterUtil$RegionServerThread) at java.lang. Thread .join( Thread .java:1186) - locked &lt;0x803da7e8> (a org.apache.hadoop.hbase.util.JVMClusterUtil$RegionServerThread) at java.lang. Thread .join( Thread .java:1239) at org.apache.hadoop.hbase.util.JVMClusterUtil.shutdown(JVMClusterUtil.java:245) at org.apache.hadoop.hbase.LocalHBaseCluster.shutdown(LocalHBaseCluster.java:430) at org.apache.hadoop.hbase.MiniHBaseCluster.shutdown(MiniHBaseCluster.java:501) at org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:856) at org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:826) at org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster.after(TestSplitTransactionOnCluster.java:109)
          Hide
          Ted Yu added a comment -

          Local run of tests flagged by QA script was successful:

          172 mt -Dtest=TestRestartCluster,TestSplitTransactionOnCluster
          173 mt -Dtest=TestUpgradeFromHFileV1ToEncoding

          Show
          Ted Yu added a comment - Local run of tests flagged by QA script was successful: 172 mt -Dtest=TestRestartCluster,TestSplitTransactionOnCluster 173 mt -Dtest=TestUpgradeFromHFileV1ToEncoding
          Hide
          ramkrishna.s.vasudevan added a comment -

          I can check on this tomorrow with the latest patch and the doubts i had earlier.

          Show
          ramkrishna.s.vasudevan added a comment - I can check on this tomorrow with the latest patch and the doubts i had earlier.
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Ted
          Went thro the patch.. Wrote some tests to see that things work fine. Anyway did not do any perf tests.
          Looks good to me. Thanks for doing this.

          Show
          ramkrishna.s.vasudevan added a comment - @Ted Went thro the patch.. Wrote some tests to see that things work fine. Anyway did not do any perf tests. Looks good to me. Thanks for doing this.
          Hide
          Ted Yu added a comment -

          Thanks for the review, Ram.

          +1 from me too.

          Show
          Ted Yu added a comment - Thanks for the review, Ram. +1 from me too.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12563428/5416-v16.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces lines longer than 100

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestSplitTransaction

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563428/5416-v16.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces lines longer than 100 -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestSplitTransaction Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3876//console This message is automatically generated.
          Hide
          ramkrishna.s.vasudevan added a comment -

          +1 from me.

          Show
          ramkrishna.s.vasudevan added a comment - +1 from me.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12563434/5416-v16.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 9 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces lines longer than 100

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.replication.TestReplicationWithCompression

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12563434/5416-v16.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 9 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces lines longer than 100 -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplicationWithCompression Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3877//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          I plan to integrate patch v16 Monday (the 7th) afternoon.

          Show
          Ted Yu added a comment - I plan to integrate patch v16 Monday (the 7th) afternoon.
          Hide
          Lars Hofhansl added a comment -

          Can we rule out first that there are no more general approaches?
          For example: What about declaring the columns that will have filters applied to them in the scan object? Maybe there are more way to look at this.

          I would also feel better if some of the Facebooks folks took a look at this (Mikhail Bautin, Karthik Ranganathan, Kannan Muthukkaruppan).

          Generally I like the patch, and I think we should even have this in 0.94.

          Show
          Lars Hofhansl added a comment - Can we rule out first that there are no more general approaches? For example: What about declaring the columns that will have filters applied to them in the scan object? Maybe there are more way to look at this. I would also feel better if some of the Facebooks folks took a look at this ( Mikhail Bautin , Karthik Ranganathan , Kannan Muthukkaruppan ). Generally I like the patch, and I think we should even have this in 0.94.
          Hide
          Sergey Shelukhin added a comment -

          Btw, the integration test for this is in HBASE-7383. I will run it locally for latest patch.

          Show
          Sergey Shelukhin added a comment - Btw, the integration test for this is in HBASE-7383 . I will run it locally for latest patch.
          Hide
          Sergey Shelukhin added a comment -

          Btw, the test appears to pass.

          Show
          Sergey Shelukhin added a comment - Btw, the test appears to pass.
          Hide
          Ted Yu added a comment -

          Mikhail Bautin, Karthik Ranganathan, Kannan Muthukkaruppan:
          Your opinion would be helpful.

          Thanks

          Show
          Ted Yu added a comment - Mikhail Bautin , Karthik Ranganathan , Kannan Muthukkaruppan : Your opinion would be helpful. Thanks
          Hide
          Karthik Ranganathan added a comment -

          I think the specific description (of making filters apply to only some CF's) is a good idea.But we continue down this path of generalizing filters, it could lead to an explosion of ad-hoc filters. In that case, it might be better to expose more co-processor hooks. Overall, +1 (only skimmed the changes though).

          Show
          Karthik Ranganathan added a comment - I think the specific description (of making filters apply to only some CF's) is a good idea.But we continue down this path of generalizing filters, it could lead to an explosion of ad-hoc filters. In that case, it might be better to expose more co-processor hooks. Overall, +1 (only skimmed the changes though).
          Hide
          Ted Yu added a comment -

          Thanks for the review, Karthik.
          I will think about how co-processor hooks can be used to reduce changes in filters.

          Show
          Ted Yu added a comment - Thanks for the review, Karthik. I will think about how co-processor hooks can be used to reduce changes in filters.
          Hide
          Lars Hofhansl added a comment -

          I don't necessarily think that ad hoc filters are bad. They are nice in that they are per store, can do skip scans, etc. They fill a different use case compare to coprocs.
          If anything, this might be an impetus to support filters better (load them dynamically like coprocs, maybe even invent a general filter descriptions, etc, etc).

          Since nobody has better API ideas I'm +1 on committing (both 0.94 and 0.96).

          Show
          Lars Hofhansl added a comment - I don't necessarily think that ad hoc filters are bad. They are nice in that they are per store, can do skip scans, etc. They fill a different use case compare to coprocs. If anything, this might be an impetus to support filters better (load them dynamically like coprocs, maybe even invent a general filter descriptions, etc, etc). Since nobody has better API ideas I'm +1 on committing (both 0.94 and 0.96).
          Hide
          Ted Yu added a comment -

          We have 4 +1's for this JIRA.
          It is time to integrate.
          I plan to do that in trunk by this evening.

          @Lars:
          I haven't run test suite for 0.94 patch, do you want to integrate to 0.94 branch ?

          Thanks

          Show
          Ted Yu added a comment - We have 4 +1's for this JIRA. It is time to integrate. I plan to do that in trunk by this evening. @Lars: I haven't run test suite for 0.94 patch, do you want to integrate to 0.94 branch ? Thanks
          Hide
          Ted Yu added a comment -

          Integrated to trunk.

          Thanks for the patch, Max and Sergey.

          Thanks for the review, Stack, Lars, Ram and Karthik.

          Show
          Ted Yu added a comment - Integrated to trunk. Thanks for the patch, Max and Sergey. Thanks for the review, Stack, Lars, Ram and Karthik.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #3716 (See https://builds.apache.org/job/HBase-TRUNK/3716/)
          HBASE-5416 Improve performance of scans with some kind of filters (Max Lapan and Sergey) (Revision 1431103)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java
          • /hbase/trunk/hbase-protocol/src/main/protobuf/Client.proto
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Scan.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/Filter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FilterWrapper.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueExcludeFilter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SkipFilter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/WhileMatchFilter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestSingleColumnValueExcludeFilter.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestJoinedScanners.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #3716 (See https://builds.apache.org/job/HBase-TRUNK/3716/ ) HBASE-5416 Improve performance of scans with some kind of filters (Max Lapan and Sergey) (Revision 1431103) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java /hbase/trunk/hbase-protocol/src/main/protobuf/Client.proto /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Scan.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/Filter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FilterWrapper.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueExcludeFilter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SkipFilter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/WhileMatchFilter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestSingleColumnValueExcludeFilter.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestJoinedScanners.java
          Hide
          Lars Hofhansl added a comment -

          Looks like the trunk patch was not changed since I made the 0.94 patch (except for wrapping the long line, etc).
          I'll commit in the next day or so (unless somebody objects)

          Show
          Lars Hofhansl added a comment - Looks like the trunk patch was not changed since I made the 0.94 patch (except for wrapping the long line, etc). I'll commit in the next day or so (unless somebody objects)
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #338 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/338/)
          HBASE-5416 Improve performance of scans with some kind of filters (Max Lapan and Sergey) (Revision 1431103)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java
          • /hbase/trunk/hbase-protocol/src/main/protobuf/Client.proto
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Scan.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/Filter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FilterWrapper.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueExcludeFilter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SkipFilter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/WhileMatchFilter.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestSingleColumnValueExcludeFilter.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestJoinedScanners.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #338 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/338/ ) HBASE-5416 Improve performance of scans with some kind of filters (Max Lapan and Sergey) (Revision 1431103) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/trunk/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/ClientProtos.java /hbase/trunk/hbase-protocol/src/main/protobuf/Client.proto /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Scan.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/Filter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FilterWrapper.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueExcludeFilter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/SkipFilter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/WhileMatchFilter.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestSingleColumnValueExcludeFilter.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestJoinedScanners.java
          Hide
          Sergey Shelukhin added a comment -

          Is this JIRA unresolved pending 0.94 commit? Just checking as it shows up in my filter

          Show
          Sergey Shelukhin added a comment - Is this JIRA unresolved pending 0.94 commit? Just checking as it shows up in my filter
          Hide
          Ted Yu added a comment -

          For 0.94 patch, I saw the following on my Mac:

          testScanner_JoinedScannersWithLimits(org.apache.hadoop.hbase.regionserver.TestHRegion)  Time elapsed: 0.001 sec  <<< FAILURE!
          junit.framework.AssertionFailedError: expected:<3> but was:<1>
            at junit.framework.Assert.fail(Assert.java:50)
            at junit.framework.Assert.failNotEquals(Assert.java:287)
            at junit.framework.Assert.assertEquals(Assert.java:67)
            at junit.framework.Assert.assertEquals(Assert.java:199)
            at junit.framework.Assert.assertEquals(Assert.java:205)
            at org.apache.hadoop.hbase.regionserver.TestHRegion.testScanner_JoinedScannersWithLimits(TestHRegion.java:2976)
          
          Show
          Ted Yu added a comment - For 0.94 patch, I saw the following on my Mac: testScanner_JoinedScannersWithLimits(org.apache.hadoop.hbase.regionserver.TestHRegion) Time elapsed: 0.001 sec <<< FAILURE! junit.framework.AssertionFailedError: expected:<3> but was:<1> at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at junit.framework.Assert.assertEquals(Assert.java:199) at junit.framework.Assert.assertEquals(Assert.java:205) at org.apache.hadoop.hbase.regionserver.TestHRegion.testScanner_JoinedScannersWithLimits(TestHRegion.java:2976)
          Hide
          Lars Hofhansl added a comment -

          I ran all 0.94 tests. They all pass on my machines.

          Show
          Lars Hofhansl added a comment - I ran all 0.94 tests. They all pass on my machines.
          Hide
          Ted Yu added a comment - - edited

          Test output from 0.94.
          Here is my environment:

          Darwin TYus-MacBook-Pro.local 12.2.1 Darwin Kernel Version 12.2.1: Thu Oct 18 12:13:47 PDT 2012; root:xnu-2050.20.9~1/RELEASE_X86_64 x86_64
          TYus-MacBook-Pro:hbase-snapshot-0103 tyu$ java -version
          java version "1.6.0_37"
          Java(TM) SE Runtime Environment (build 1.6.0_37-b06-434-11M3909)
          Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01-434, mixed mode)

          Show
          Ted Yu added a comment - - edited Test output from 0.94. Here is my environment: Darwin TYus-MacBook-Pro.local 12.2.1 Darwin Kernel Version 12.2.1: Thu Oct 18 12:13:47 PDT 2012; root:xnu-2050.20.9~1/RELEASE_X86_64 x86_64 TYus-MacBook-Pro:hbase-snapshot-0103 tyu$ java -version java version "1.6.0_37" Java(TM) SE Runtime Environment (build 1.6.0_37-b06-434-11M3909) Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01-434, mixed mode)
          Hide
          Lars Hofhansl added a comment -

          Does this test fail consistently for you?

          Show
          Lars Hofhansl added a comment - Does this test fail consistently for you?
          Hide
          Lars Hofhansl added a comment -

          On my machine at this test fails too in 0.94.

          Show
          Lars Hofhansl added a comment - On my machine at this test fails too in 0.94.
          Hide
          Lars Hofhansl added a comment -

          Found the problem. For the part of the patch that I had applied manually I mistook{{kv != KV_LIMIT}} for kv == KV_LIMIT.
          No idea how on earth the test on my server machine passed.

          Show
          Lars Hofhansl added a comment - Found the problem. For the part of the patch that I had applied manually I mistook{{kv != KV_LIMIT}} for kv == KV_LIMIT . No idea how on earth the test on my server machine passed.
          Hide
          Lars Hofhansl added a comment -

          Patch that has this fixed.
          I think I should do a bit more double-checking before committing.

          Show
          Lars Hofhansl added a comment - Patch that has this fixed. I think I should do a bit more double-checking before committing.
          Hide
          Ted Yu added a comment -

          Thanks Lars for the finding. I am running test suite based on patch v3.
          Will report back if there is any abnormality.
          I was looking for long lines.

          +  public static final String LOAD_CFS_ON_DEMAND_CONFIG_KEY = "hbase.hregion.scan.loadColumnFamiliesOnDemand";
          

          nit: wrap long line above.

          +     * @param heap KeyValueHeap to fetch data from. It must be positioned on correct row before call.
          

          Long line: it would be 100 characters wide if the trailing period is removed.

          +          stopRow = nextKv == null || isStopRow(nextKv.getBuffer(), nextKv.getRowOffset(), nextKv.getRowLength());
          
          Show
          Ted Yu added a comment - Thanks Lars for the finding. I am running test suite based on patch v3. Will report back if there is any abnormality. I was looking for long lines. + public static final String LOAD_CFS_ON_DEMAND_CONFIG_KEY = "hbase.hregion.scan.loadColumnFamiliesOnDemand" ; nit: wrap long line above. + * @param heap KeyValueHeap to fetch data from. It must be positioned on correct row before call. Long line: it would be 100 characters wide if the trailing period is removed. + stopRow = nextKv == null || isStopRow(nextKv.getBuffer(), nextKv.getRowOffset(), nextKv.getRowLength());
          Hide
          Lars Hofhansl added a comment -

          Thanks Ted. Will wrap lines upon commit (unless there's another version needed).

          Show
          Lars Hofhansl added a comment - Thanks Ted. Will wrap lines upon commit (unless there's another version needed).
          Hide
          Ted Yu added a comment -

          The following tests failed locally:
          TestMultiSlaveReplication,TestMasterReplication,TestZKLeaderManager

          The first two failed without the patch, too.

          I think patch v3 should be good to go.

          Show
          Ted Yu added a comment - The following tests failed locally: TestMultiSlaveReplication,TestMasterReplication,TestZKLeaderManager The first two failed without the patch, too. I think patch v3 should be good to go.
          Hide
          Lars Hofhansl added a comment -

          I think I convinced myself that this is good to go for 0.94.

          Going forward this could be useful for all kinds of filters. I can see many scenarios where we want filters to be evaluated on selected CFs only and include the other CFs when the row is not filtered based on the former.

          Show
          Lars Hofhansl added a comment - I think I convinced myself that this is good to go for 0.94. Going forward this could be useful for all kinds of filters. I can see many scenarios where we want filters to be evaluated on selected CFs only and include the other CFs when the row is not filtered based on the former.
          Hide
          Sergey Shelukhin added a comment -

          Can this be counted as +1 to commit?

          Show
          Sergey Shelukhin added a comment - Can this be counted as +1 to commit?
          Hide
          Lars Hofhansl added a comment -

          Yes. I will commit this today.

          Show
          Lars Hofhansl added a comment - Yes. I will commit this today.
          Hide
          Lars Hofhansl added a comment -

          Committed to 0.94. Thanks for the patch!

          Show
          Lars Hofhansl added a comment - Committed to 0.94. Thanks for the patch!
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #732 (See https://builds.apache.org/job/HBase-0.94/732/)
          HBASE-5416 Improve performance of scans with some kind of filters. (Sergey Shelukhin) (Revision 1433195)

          Result = SUCCESS
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Scan.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/Filter.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueExcludeFilter.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SkipFilter.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/WhileMatchFilter.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestSingleColumnValueExcludeFilter.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #732 (See https://builds.apache.org/job/HBase-0.94/732/ ) HBASE-5416 Improve performance of scans with some kind of filters. (Sergey Shelukhin) (Revision 1433195) Result = SUCCESS larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Scan.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/Filter.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueExcludeFilter.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SkipFilter.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/WhileMatchFilter.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestSingleColumnValueExcludeFilter.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #3745 (See https://builds.apache.org/job/HBase-TRUNK/3745/)
          HBASE-7383 create integration test for HBASE-5416 (improving scan performance for certain filters) (Sergey) (Revision 1433224)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/test/LoadTestDataGenerator.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/test/LoadTestKVGenerator.java
          • /hbase/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestLoadTestKVGenerator.java
          • /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestLazyCfLoading.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestEncodedSeekers.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/LoadTestTool.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedAction.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedReader.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedWriter.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/RestartMetaTest.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestMiniClusterLoadSequential.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #3745 (See https://builds.apache.org/job/HBase-TRUNK/3745/ ) HBASE-7383 create integration test for HBASE-5416 (improving scan performance for certain filters) (Sergey) (Revision 1433224) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/test/LoadTestDataGenerator.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/test/LoadTestKVGenerator.java /hbase/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestLoadTestKVGenerator.java /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestLazyCfLoading.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestEncodedSeekers.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/LoadTestTool.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedAction.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedReader.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedWriter.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/RestartMetaTest.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestMiniClusterLoadSequential.java
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #348 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/348/)
          HBASE-7383 create integration test for HBASE-5416 (improving scan performance for certain filters) (Sergey) (Revision 1433224)

          Result = FAILURE
          tedyu :
          Files :

          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/test/LoadTestDataGenerator.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/test/LoadTestKVGenerator.java
          • /hbase/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestLoadTestKVGenerator.java
          • /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestLazyCfLoading.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestEncodedSeekers.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/LoadTestTool.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedAction.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedReader.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedWriter.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/RestartMetaTest.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestMiniClusterLoadSequential.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #348 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/348/ ) HBASE-7383 create integration test for HBASE-5416 (improving scan performance for certain filters) (Sergey) (Revision 1433224) Result = FAILURE tedyu : Files : /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/test/LoadTestDataGenerator.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/util/test/LoadTestKVGenerator.java /hbase/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/util/TestLoadTestKVGenerator.java /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestLazyCfLoading.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/io/encoding/TestEncodedSeekers.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/LoadTestTool.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedAction.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedReader.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MultiThreadedWriter.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/RestartMetaTest.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestMiniClusterLoadSequential.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #95 (See https://builds.apache.org/job/HBase-0.94-security/95/)
          HBASE-5416 Improve performance of scans with some kind of filters. (Sergey Shelukhin) (Revision 1433195)

          Result = SUCCESS
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Scan.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/Filter.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueExcludeFilter.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SkipFilter.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/WhileMatchFilter.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestSingleColumnValueExcludeFilter.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #95 (See https://builds.apache.org/job/HBase-0.94-security/95/ ) HBASE-5416 Improve performance of scans with some kind of filters. (Sergey Shelukhin) (Revision 1433195) Result = SUCCESS larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Scan.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/Filter.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueExcludeFilter.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SkipFilter.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/WhileMatchFilter.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestSingleColumnValueExcludeFilter.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security-on-Hadoop-23 #11 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/11/)
          HBASE-5416 Improve performance of scans with some kind of filters. (Sergey Shelukhin) (Revision 1433195)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Scan.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/Filter.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueExcludeFilter.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SkipFilter.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/WhileMatchFilter.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestSingleColumnValueExcludeFilter.java
          • /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security-on-Hadoop-23 #11 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/11/ ) HBASE-5416 Improve performance of scans with some kind of filters. (Sergey Shelukhin) (Revision 1433195) Result = FAILURE larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Scan.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/Filter.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/FilterList.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueExcludeFilter.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/SkipFilter.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/filter/WhileMatchFilter.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/filter/TestSingleColumnValueExcludeFilter.java /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
          Hide
          Dave Latham added a comment -

          I have a class that directly implements the Filter interface. This change looks to me like it will prevent me from doing a rolling upgrade to 0.94.5 of region servers while my client is using this filter on scans because the filter will fail to implement the changed interface. Is that correct? Is that acceptable?

          Show
          Dave Latham added a comment - I have a class that directly implements the Filter interface. This change looks to me like it will prevent me from doing a rolling upgrade to 0.94.5 of region servers while my client is using this filter on scans because the filter will fail to implement the changed interface. Is that correct? Is that acceptable?
          Hide
          Sergey Shelukhin added a comment -

          Hmm, this is correct. I am not sure if this is acceptable, iirc I saw someone pondering that (on the mailing list?) but deciding that most people would use FilterBase, but I cannot find it now.
          How hard is it to change filter to use FilterBase and replace it first?

          Lars Hofhansl Do you have an opinion?

          Show
          Sergey Shelukhin added a comment - Hmm, this is correct. I am not sure if this is acceptable, iirc I saw someone pondering that (on the mailing list?) but deciding that most people would use FilterBase, but I cannot find it now. How hard is it to change filter to use FilterBase and replace it first? Lars Hofhansl Do you have an opinion?
          Hide
          stack added a comment -

          Lars Hofhansl Would suggest backing out this change if it breaks compatibility, especially if it breaks compatibility for our homies in SOMA, SF.

          Show
          stack added a comment - Lars Hofhansl Would suggest backing out this change if it breaks compatibility, especially if it breaks compatibility for our homies in SOMA, SF.
          Hide
          stack added a comment -

          Lars Hofhansl Thanks for adding it to trunk. Max Lapan Any chance of a paragraph on your fancy new feature? If you draft it – including the possibilities your patch enables – I'll take care of getting it into the ref guide. Good stuff.

          Show
          stack added a comment - Lars Hofhansl Thanks for adding it to trunk. Max Lapan Any chance of a paragraph on your fancy new feature? If you draft it – including the possibilities your patch enables – I'll take care of getting it into the ref guide. Good stuff.
          Hide
          Ted Yu added a comment -

          The 0.94 patch did introduce subtle issue.

          But this feature is useful. See email thread entitled 'Co-Processor in scanning the HBase's Table' on mailing list.

          The cause seems to be the addition of a new method to Filter interface. Can we do the following ?
          1. introduce new interface, say Filter2 (open to other names), where isFamilyEssential(byte[] name) is added
          2. move isFamilyEssential(byte[] name) out of Filter interface
          3. let FilterBase implement Filter2
          4. declare filter field of RegionScannerImpl to be of type Filter2

          Since 0.94.5 has been rolled out, it is another kind of regression if this feature is taken out.

          My two cents.

          Show
          Ted Yu added a comment - The 0.94 patch did introduce subtle issue. But this feature is useful. See email thread entitled 'Co-Processor in scanning the HBase's Table' on mailing list. The cause seems to be the addition of a new method to Filter interface. Can we do the following ? 1. introduce new interface, say Filter2 (open to other names), where isFamilyEssential(byte[] name) is added 2. move isFamilyEssential(byte[] name) out of Filter interface 3. let FilterBase implement Filter2 4. declare filter field of RegionScannerImpl to be of type Filter2 Since 0.94.5 has been rolled out, it is another kind of regression if this feature is taken out. My two cents.
          Hide
          Lars Hofhansl added a comment -

          IMHO this is similar to the coprocessor changes we had made in some 0.94 point releases that also break coprocessors (unless they derive from classes like BaseRegionObserver). In fact our own Phoenix folks ran into issues with this.

          These are somewhat internal APIs and we should be able to change them... Although I admit Filters are more stable in terms of APIs than coprocessors. Still, I'd vote for keep this patch, unchanged.

          Dave Latham, I'd be interested in why you had to implement Filter directly rather than extending FilterBase.

          Show
          Lars Hofhansl added a comment - IMHO this is similar to the coprocessor changes we had made in some 0.94 point releases that also break coprocessors (unless they derive from classes like BaseRegionObserver). In fact our own Phoenix folks ran into issues with this. These are somewhat internal APIs and we should be able to change them... Although I admit Filters are more stable in terms of APIs than coprocessors. Still, I'd vote for keep this patch, unchanged. Dave Latham , I'd be interested in why you had to implement Filter directly rather than extending FilterBase.
          Hide
          stack added a comment -

          There is no argument that this a 'useful' feature. 'useful' is not good enough reason to break 'public' Interface. Why would we put any obstacle in the way of the group that is running the largest hbase deploy? Don't they have enough headache already w/o having to jump a gratuitous incompatibility hurdle Anyone even 'need' this feature in 0.94? Suggest removing it for 0.90.6 so our man Dave can just go there.

          Show
          stack added a comment - There is no argument that this a 'useful' feature. 'useful' is not good enough reason to break 'public' Interface. Why would we put any obstacle in the way of the group that is running the largest hbase deploy? Don't they have enough headache already w/o having to jump a gratuitous incompatibility hurdle Anyone even 'need' this feature in 0.94? Suggest removing it for 0.90.6 so our man Dave can just go there.
          Hide
          Ted Yu added a comment -

          Dave Latham, stack:
          What do you think of my proposal above (@23/Feb/13 06:11) ?

          Would that allow Dave to get over the hurdle ?

          If you think so, I can open a new JIRA with a patch.

          Thanks

          Show
          Ted Yu added a comment - Dave Latham , stack : What do you think of my proposal above (@23/Feb/13 06:11) ? Would that allow Dave to get over the hurdle ? If you think so, I can open a new JIRA with a patch. Thanks
          Hide
          Lars Hofhansl added a comment -

          A rolling upgrade is still possible if a stub isFamilyEssential(...) is added to the Filter implementation before the rolling upgrade.

          Anyway, I am not attached to this feature in 0.94.

          At the same time I do not want to cripple our ability to make some changes to these APIs. We have not frozen the coprocessor APIs and neither should we freeze the Filter APIs.
          What if somebody had implemented a coprocessor API that we had changed. In the past we have stated that we will change these (coprocessor) APIs.

          Ted, I think your approach will just make things more complicated going forward. And I'd prefer to either keep this or revert altogether.

          Show
          Lars Hofhansl added a comment - A rolling upgrade is still possible if a stub isFamilyEssential(...) is added to the Filter implementation before the rolling upgrade. Anyway, I am not attached to this feature in 0.94. At the same time I do not want to cripple our ability to make some changes to these APIs. We have not frozen the coprocessor APIs and neither should we freeze the Filter APIs. What if somebody had implemented a coprocessor API that we had changed. In the past we have stated that we will change these (coprocessor) APIs. Ted, I think your approach will just make things more complicated going forward. And I'd prefer to either keep this or revert altogether.
          Hide
          Lars Hofhansl added a comment -

          One last comment

          The reason why I am arguing keeping this is that this is one of the few features that allows HBase to make use to of its columnar nature to speed up queries.
          HBase is not known for its scan performance and this is one features to point to where we allow HBase to not even look at another column family unless a filter is matched for potentially significant speedups. I was planning on extending this to other filters as well.

          Show
          Lars Hofhansl added a comment - One last comment The reason why I am arguing keeping this is that this is one of the few features that allows HBase to make use to of its columnar nature to speed up queries. HBase is not known for its scan performance and this is one features to point to where we allow HBase to not even look at another column family unless a filter is matched for potentially significant speedups. I was planning on extending this to other filters as well.
          Hide
          Ted Yu added a comment -

          I am with Lars on this one. The feature should be part of 0.94

          Ted, I think your approach will just make things more complicated going forward

          Another option is to drop the new method from Filter interface. Server side implementation depends on FilterBase which has the stub isFamilyEssential().
          HRegion.RegionScannerImpl can use instanceof check which is fast.

          Show
          Ted Yu added a comment - I am with Lars on this one. The feature should be part of 0.94 Ted, I think your approach will just make things more complicated going forward Another option is to drop the new method from Filter interface. Server side implementation depends on FilterBase which has the stub isFamilyEssential(). HRegion.RegionScannerImpl can use instanceof check which is fast.
          Hide
          Ted Yu added a comment -

          A small patch illustrating my second approach.
          All Filter tests passed based on this patch:

          Tests run: 108, Failures: 0, Errors: 0, Skipped: 0
          
          [INFO] 
          [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase ---
          [INFO] Tests are skipped.
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD SUCCESS
          

          All scanner related tests also passed:

          Tests run: 65, Failures: 0, Errors: 0, Skipped: 0
          
          [INFO] 
          [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase ---
          [INFO] Tests are skipped.
          [INFO] ------------------------------------------------------------------------
          [INFO] BUILD SUCCESS
          [INFO] ------------------------------------------------------------------------
          [INFO] Total time: 3:24.352s
          

          If people think this approach is Okay, I will open new JIRA and attach it there.

          Show
          Ted Yu added a comment - A small patch illustrating my second approach. All Filter tests passed based on this patch: Tests run: 108, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase --- [INFO] Tests are skipped. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS All scanner related tests also passed: Tests run: 65, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] --- maven-surefire-plugin:2.10:test (secondPartTestsExecution) @ hbase --- [INFO] Tests are skipped. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 3:24.352s If people think this approach is Okay, I will open new JIRA and attach it there.
          Hide
          Dave Latham added a comment -

          How hard is it to change filter to use FilterBase and replace it first?

          The change is very simple for us. It means we need to a wait a bit before deploying the hbase upgrade until we can upgrade our client apps first, though. This is what we've decided to do, so this incompatibility is not going to be a blocker for us, just a slight delay.

          I'd be interested in why you had to implement Filter directly rather than extending FilterBase.

          This particular Filter implementation was made as a wrapper around any other Filter as part of some experiments we were doing for more dynamic Filter classloading a couple years back. I don't think there was a FilterBase class at the time or we may have just chose to make it a generic Filter (or actually RowFilterInterface back then) to make sure it implements and wraps every method.

          I think leaving the method in FilterBase only for 0.94 would be a good move. However, it's a bit tricky since 0.94.5 has already been released. If the method is dropped from Filter in 0.94.6 then we're saying 0.94.6 is compatible with everything but 0.94.5. However if you were unfortunate enough to start on 0.94.5 and implement Filter directly then you're going to break again. Perhaps that's a rare enough case.

          Show
          Dave Latham added a comment - How hard is it to change filter to use FilterBase and replace it first? The change is very simple for us. It means we need to a wait a bit before deploying the hbase upgrade until we can upgrade our client apps first, though. This is what we've decided to do, so this incompatibility is not going to be a blocker for us, just a slight delay. I'd be interested in why you had to implement Filter directly rather than extending FilterBase. This particular Filter implementation was made as a wrapper around any other Filter as part of some experiments we were doing for more dynamic Filter classloading a couple years back. I don't think there was a FilterBase class at the time or we may have just chose to make it a generic Filter (or actually RowFilterInterface back then) to make sure it implements and wraps every method. I think leaving the method in FilterBase only for 0.94 would be a good move. However, it's a bit tricky since 0.94.5 has already been released. If the method is dropped from Filter in 0.94.6 then we're saying 0.94.6 is compatible with everything but 0.94.5. However if you were unfortunate enough to start on 0.94.5 and implement Filter directly then you're going to break again. Perhaps that's a rare enough case.
          Hide
          Ted Yu added a comment -

          @Dave:
          0.94.5 was announced on 2013-02-16. If 0.94.6 is released within 10 days, the window of someone implementing 0.94.5 version of Filter interface is very short.

          In hindsight, we should have implemented this feature in 0.94 without touching Filter interface.
          We have a good lesson (for other interfaces).

          If you want to deploy 0.94.5 in the next few days, try not adding @Override in your Filter implementation.

          Again, thanks for reporting this - other HBase users would get benefit.

          Show
          Ted Yu added a comment - @Dave: 0.94.5 was announced on 2013-02-16. If 0.94.6 is released within 10 days, the window of someone implementing 0.94.5 version of Filter interface is very short. In hindsight, we should have implemented this feature in 0.94 without touching Filter interface. We have a good lesson (for other interfaces). If you want to deploy 0.94.5 in the next few days, try not adding @Override in your Filter implementation. Again, thanks for reporting this - other HBase users would get benefit.
          Hide
          Ted Yu added a comment -

          testScanner_JoinedScanners passed as well:

          Running org.apache.hadoop.hbase.regionserver.TestHRegion
          2013-02-23 09:07:06.614 java[57714:1203] Unable to load realm info from SCDynamicStore
          Tests run: 73, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 44.586 sec
          
          Show
          Ted Yu added a comment - testScanner_JoinedScanners passed as well: Running org.apache.hadoop.hbase.regionserver.TestHRegion 2013-02-23 09:07:06.614 java[57714:1203] Unable to load realm info from SCDynamicStore Tests run: 73, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 44.586 sec
          Hide
          Lars Hofhansl added a comment -

          Moving the method to FilterBase is a great idea and a good compromise.

          Personally I think implementing Filters directly is rare and I would have a preference for keeping this in the interface since it is cleaner.
          isFamilyEssential is very useful for future scan performance enhancements, I would hate to see it vanish again from the Filter interface. (Incidentally in trunk Filter is now a class, which would have allowed us to make changes without this problem).

          As alternative can we add to the Javadoc of Filter a note to avoid implementing it directly and rather extend FilterBase?

          Dave Latham If this is a hassle for you I think we're all in agreement that we should push the method down to FilterBase.
          stack I think you'd prefer the push into FilterBase. Let's just do that.

          Show
          Lars Hofhansl added a comment - Moving the method to FilterBase is a great idea and a good compromise. Personally I think implementing Filters directly is rare and I would have a preference for keeping this in the interface since it is cleaner. isFamilyEssential is very useful for future scan performance enhancements, I would hate to see it vanish again from the Filter interface. (Incidentally in trunk Filter is now a class, which would have allowed us to make changes without this problem). As alternative can we add to the Javadoc of Filter a note to avoid implementing it directly and rather extend FilterBase? Dave Latham If this is a hassle for you I think we're all in agreement that we should push the method down to FilterBase. stack I think you'd prefer the push into FilterBase. Let's just do that.
          Hide
          Dave Latham added a comment -

          It's not going to make a difference for me any longer as we're planning to move forward with an application update then a 0.94.5 upgrade. However, it sounds like a good plan to move to FilterBase (in the 0.94 branch only) to preserve compatibility for anyone else who comes along.

          Show
          Dave Latham added a comment - It's not going to make a difference for me any longer as we're planning to move forward with an application update then a 0.94.5 upgrade. However, it sounds like a good plan to move to FilterBase (in the 0.94 branch only) to preserve compatibility for anyone else who comes along.
          Hide
          stack added a comment -

          It's not going to make a difference for me any longer as we're planning to move forward with an application update then a 0.94.5 upgrade.

          So, you don't need us back anything out?

          What do you think of my proposal above (@23/Feb/13 06:11) ?

          Can't find what you are referring to Ted Yu. If i search I only see the above pointer.

          Dave Latham If this is a hassle for you I think we're all in agreement that we should push the method down to FilterBase.

          For 0.94? If it means a better compatibility story (with a hiccup, i.e. we warn folks about prob. in 0.90.5 but its fixed in 0.90.6), then I'm for it.

          Show
          stack added a comment - It's not going to make a difference for me any longer as we're planning to move forward with an application update then a 0.94.5 upgrade. So, you don't need us back anything out? What do you think of my proposal above (@23/Feb/13 06:11) ? Can't find what you are referring to Ted Yu . If i search I only see the above pointer. Dave Latham If this is a hassle for you I think we're all in agreement that we should push the method down to FilterBase. For 0.94? If it means a better compatibility story (with a hiccup, i.e. we warn folks about prob. in 0.90.5 but its fixed in 0.90.6), then I'm for it.
          Hide
          Ted Yu added a comment -

          If i search I only see the above pointer.

          I was referring to approach #1 which Lars said is too complicated.

          Show
          Ted Yu added a comment - If i search I only see the above pointer. I was referring to approach #1 which Lars said is too complicated.
          Hide
          Ted Yu added a comment -

          Created HBASE-7920 to move the new method out of Filter interface.

          Show
          Ted Yu added a comment - Created HBASE-7920 to move the new method out of Filter interface.
          Hide
          Dave Latham added a comment -

          So, you don't need us back anything out?

          That's right, we're just going to work around it.

          Show
          Dave Latham added a comment - So, you don't need us back anything out? That's right, we're just going to work around it.
          Hide
          Lars Hofhansl added a comment -

          It feels a bit like an overreaction. Not many folks implement their own filters, of those not many implement Filter directly, there are workarounds, and for Dave it no longer matters.

          Show
          Lars Hofhansl added a comment - It feels a bit like an overreaction. Not many folks implement their own filters, of those not many implement Filter directly, there are workarounds, and for Dave it no longer matters.
          Hide
          Sergey Shelukhin added a comment -

          what is this patch for in this JIRA? It's been closed months ago

          Show
          Sergey Shelukhin added a comment - what is this patch for in this JIRA? It's been closed months ago

            People

            • Assignee:
              Sergey Shelukhin
              Reporter:
              Max Lapan
            • Votes:
              0 Vote for this issue
              Watchers:
              31 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development