HBase
  1. HBase
  2. HBASE-4469

Avoid top row seek by looking up ROWCOL bloomfilter

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

        Issue Links

          Activity

          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2235/
          -----------------------------------------------------------

          Review request for hbase.

          Summary
          -------

          The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family).
          However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter.
          We can take advantage of this factor to avoid seeking to the top of row.

          Also, Update the TestBlocksRead unit tests. since most of block read count has dropped to a lower number.

          Evaluation:
          In TestSeekingOptimization, it saved 31.6% seek operation perviously.
          Now it saves about 41.82% seek operation.
          10% more seek operation.

          ======================
          Before this diff:
          For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%

          =====================
          Apply this diff:
          For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
          =====================

          Thanks Mikhail and Kannan's help and discussion.

          This addresses bug HBASE-4469.
          https://issues.apache.org/jira/browse/HBASE-4469

          Diffs


          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 7b0b9e6
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 8dd8a68
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java abccea4

          Diff: https://reviews.apache.org/r/2235/diff

          Testing
          -------

          Run all the unit tests.
          There are 2 unit tests failed with and without my change.
          TestDistributedLogSplitting
          TestHTablePool

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2235/ ----------------------------------------------------------- Review request for hbase. Summary ------- The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. Also, Update the TestBlocksRead unit tests. since most of block read count has dropped to a lower number. Evaluation: In TestSeekingOptimization, it saved 31.6% seek operation perviously. Now it saves about 41.82% seek operation. 10% more seek operation. ====================== Before this diff: For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% ===================== Apply this diff: For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% ===================== Thanks Mikhail and Kannan's help and discussion. This addresses bug HBASE-4469 . https://issues.apache.org/jira/browse/HBASE-4469 Diffs src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 7b0b9e6 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 8dd8a68 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java abccea4 Diff: https://reviews.apache.org/r/2235/diff Testing ------- Run all the unit tests. There are 2 unit tests failed with and without my change. TestDistributedLogSplitting TestHTablePool Thanks, Liyin
          Hide
          Ted Yu added a comment -

          I don't see TestBlocksRead in the latest review.

          Show
          Ted Yu added a comment - I don't see TestBlocksRead in the latest review.
          Hide
          Liyin Tang added a comment -

          Yes, I didn't change that unit tests TestBlocksRead, which is passed successfully.

          Show
          Liyin Tang added a comment - Yes, I didn't change that unit tests TestBlocksRead, which is passed successfully.
          Hide
          Nicolas Spiegelberg added a comment -

          +1. lgtm

          Show
          Nicolas Spiegelberg added a comment - +1. lgtm
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2235/#review2417
          -----------------------------------------------------------

          +1.

          Nice optimization Liyin. Changes look good. [This is running nicely on our internal branch.]

          • Kannan

          On 2011-10-06 17:17:23, Liyin wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2235/

          -----------------------------------------------------------

          (Updated 2011-10-06 17:17:23)

          Review request for hbase.

          Summary

          -------

          The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family).

          However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter.

          We can take advantage of this factor to avoid seeking to the top of row.

          Also, Update the TestBlocksRead unit tests. since most of block read count has dropped to a lower number.

          Evaluation:

          In TestSeekingOptimization, it saved 31.6% seek operation perviously.

          Now it saves about 41.82% seek operation.

          10% more seek operation.

          ======================

          Before this diff:

          For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%

          =====================

          Apply this diff:

          For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          =====================

          Thanks Mikhail and Kannan's help and discussion.

          This addresses bug HBASE-4469.

          https://issues.apache.org/jira/browse/HBASE-4469

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 7b0b9e6

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 8dd8a68

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java abccea4

          Diff: https://reviews.apache.org/r/2235/diff

          Testing

          -------

          Run all the unit tests.

          There are 2 unit tests failed with and without my change.

          TestDistributedLogSplitting

          TestHTablePool

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2235/#review2417 ----------------------------------------------------------- +1. Nice optimization Liyin. Changes look good. [This is running nicely on our internal branch.] Kannan On 2011-10-06 17:17:23, Liyin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2235/ ----------------------------------------------------------- (Updated 2011-10-06 17:17:23) Review request for hbase. Summary ------- The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. Also, Update the TestBlocksRead unit tests. since most of block read count has dropped to a lower number. Evaluation: In TestSeekingOptimization, it saved 31.6% seek operation perviously. Now it saves about 41.82% seek operation. 10% more seek operation. ====================== Before this diff: For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% ===================== Apply this diff: For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% ===================== Thanks Mikhail and Kannan's help and discussion. This addresses bug HBASE-4469 . https://issues.apache.org/jira/browse/HBASE-4469 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 7b0b9e6 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 8dd8a68 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java abccea4 Diff: https://reviews.apache.org/r/2235/diff Testing ------- Run all the unit tests. There are 2 unit tests failed with and without my change. TestDistributedLogSplitting TestHTablePool Thanks, Liyin
          Hide
          Ted Yu added a comment -

          +1 on patch.
          Nice job.

          Show
          Ted Yu added a comment - +1 on patch. Nice job.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2235/#review2541
          -----------------------------------------------------------

          Ship it!

          Patch looks good. Small. Only works if bloom filters are already on?

          • Michael

          On 2011-10-06 17:17:23, Liyin wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2235/

          -----------------------------------------------------------

          (Updated 2011-10-06 17:17:23)

          Review request for hbase.

          Summary

          -------

          The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family).

          However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter.

          We can take advantage of this factor to avoid seeking to the top of row.

          Also, Update the TestBlocksRead unit tests. since most of block read count has dropped to a lower number.

          Evaluation:

          In TestSeekingOptimization, it saved 31.6% seek operation perviously.

          Now it saves about 41.82% seek operation.

          10% more seek operation.

          ======================

          Before this diff:

          For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%

          =====================

          Apply this diff:

          For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          =====================

          Thanks Mikhail and Kannan's help and discussion.

          This addresses bug HBASE-4469.

          https://issues.apache.org/jira/browse/HBASE-4469

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 7b0b9e6

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 8dd8a68

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java abccea4

          Diff: https://reviews.apache.org/r/2235/diff

          Testing

          -------

          Run all the unit tests.

          There are 2 unit tests failed with and without my change.

          TestDistributedLogSplitting

          TestHTablePool

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2235/#review2541 ----------------------------------------------------------- Ship it! Patch looks good. Small. Only works if bloom filters are already on? Michael On 2011-10-06 17:17:23, Liyin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2235/ ----------------------------------------------------------- (Updated 2011-10-06 17:17:23) Review request for hbase. Summary ------- The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. Also, Update the TestBlocksRead unit tests. since most of block read count has dropped to a lower number. Evaluation: In TestSeekingOptimization, it saved 31.6% seek operation perviously. Now it saves about 41.82% seek operation. 10% more seek operation. ====================== Before this diff: For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% ===================== Apply this diff: For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% ===================== Thanks Mikhail and Kannan's help and discussion. This addresses bug HBASE-4469 . https://issues.apache.org/jira/browse/HBASE-4469 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 7b0b9e6 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 8dd8a68 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java abccea4 Diff: https://reviews.apache.org/r/2235/diff Testing ------- Run all the unit tests. There are 2 unit tests failed with and without my change. TestDistributedLogSplitting TestHTablePool Thanks, Liyin
          Hide
          Jonathan Gray added a comment -

          @stack, yeah, this version only work if you have rowcol blooms enabled. The generic version is going to be implemented over in HBASE-4532.

          Show
          Jonathan Gray added a comment - @stack, yeah, this version only work if you have rowcol blooms enabled. The generic version is going to be implemented over in HBASE-4532 .
          Hide
          Liyin Tang added a comment -

          HBASE-4532 will enable delete family Bloom filter only when Row or None Bloom filter is enabled.
          Because if there is a delete family the store file, the RowCol Bloom filter has already had this information.

          Show
          Liyin Tang added a comment - HBASE-4532 will enable delete family Bloom filter only when Row or None Bloom filter is enabled. Because if there is a delete family the store file, the RowCol Bloom filter has already had this information.
          Hide
          stack added a comment -

          OK. I was confused. I'm +0 on this patch (since I am not familiar with what is going on here – it looks innocuous enough on review). Jon you going to commit?

          Show
          stack added a comment - OK. I was confused. I'm +0 on this patch (since I am not familiar with what is going on here – it looks innocuous enough on review). Jon you going to commit?
          Hide
          Liyin Tang added a comment -

          @stack. HBASE-4469 optimizes the top row seek if the ROWCOL Bloom filter is enabled.
          And HBASE-4532 will optimize the top row seek if ROW or NONE Bloom filter is enabled.
          So HBASE-4469 + HBASE-4532 will optimize all the cases.

          And it is necessary to commit this first

          Show
          Liyin Tang added a comment - @stack. HBASE-4469 optimizes the top row seek if the ROWCOL Bloom filter is enabled. And HBASE-4532 will optimize the top row seek if ROW or NONE Bloom filter is enabled. So HBASE-4469 + HBASE-4532 will optimize all the cases. And it is necessary to commit this first
          Hide
          Jonathan Gray added a comment -

          Liyin, can you post the final patch to this JIRA? I will commit. Thanks!

          Show
          Jonathan Gray added a comment - Liyin, can you post the final patch to this JIRA? I will commit. Thanks!
          Hide
          Liyin Tang added a comment -

          Cool, I just downloaded the patch from review board (https://reviews.apache.org/r/2235/) and attached here
          Thanks Jonathan.

          Show
          Liyin Tang added a comment - Cool, I just downloaded the patch from review board ( https://reviews.apache.org/r/2235/ ) and attached here Thanks Jonathan.
          Hide
          Jonathan Gray added a comment -

          Thanks Liyin. Unfortunately because the RB integration isn't very tight, to follow Apache protocol, you need to attach the patch to the JIRA and select the radio button that assigns it to apache.

          This also helps to ensure that there's no confusion about which version was committed and that we don't have a hard dependency on RB in any way.

          It'll all be second nature before you know it

          Show
          Jonathan Gray added a comment - Thanks Liyin. Unfortunately because the RB integration isn't very tight, to follow Apache protocol, you need to attach the patch to the JIRA and select the radio button that assigns it to apache. This also helps to ensure that there's no confusion about which version was committed and that we don't have a hard dependency on RB in any way. It'll all be second nature before you know it
          Hide
          Jonathan Gray added a comment -

          Committed to trunk.

          Show
          Jonathan Gray added a comment - Committed to trunk.
          Hide
          Jonathan Gray added a comment -

          What is the protocol now? This needs to go into the fb-89 branch, so do I keep this JIRA open until that happens, or should we just add some fb-89-pending tag or something?

          Show
          Jonathan Gray added a comment - What is the protocol now? This needs to go into the fb-89 branch, so do I keep this JIRA open until that happens, or should we just add some fb-89-pending tag or something?
          Hide
          Jonathan Gray added a comment -

          (i'm not putting in 92 branch because this is feature)

          Show
          Jonathan Gray added a comment - (i'm not putting in 92 branch because this is feature)
          Hide
          Liyin Tang added a comment -

          @Jonathan,
          For this jira specifically, it has been committed to 89-fb internal branch before cutting the public 89-fb branch.
          So it should already in the public 89-fb right now.

          Show
          Liyin Tang added a comment - @Jonathan, For this jira specifically, it has been committed to 89-fb internal branch before cutting the public 89-fb branch. So it should already in the public 89-fb right now.
          Hide
          Jonathan Gray added a comment -

          Got it, thanks Liyin! Nice work!

          Show
          Jonathan Gray added a comment - Got it, thanks Liyin! Nice work!
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2325 (See https://builds.apache.org/job/HBase-TRUNK/2325/)
          HBASE-4469 Avoid top row seek by looking up bloomfilter (liyin via jgray)

          jgray :
          Files :

          • /hbase/trunk/CHANGES.txt
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2325 (See https://builds.apache.org/job/HBase-TRUNK/2325/ ) HBASE-4469 Avoid top row seek by looking up bloomfilter (liyin via jgray) jgray : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java

            People

            • Assignee:
              Liyin Tang
              Reporter:
              Liyin Tang
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development