HBase
  1. HBase
  2. HBASE-4532

Avoid top row seek by dedicated bloom filter for delete family bloom filter

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.94.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled.
      This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

      The only subtle use case is when we are interested in the top row with empty column.

      For example,
      we are interested in row1/cf1:/1/put.
      So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
      Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
      In this way, we have already missed the real kv we are interested in.

      The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

      Evaluation from TestSeekOptimization:
      Previously:
      For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
      For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
      For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

      For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
      For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
      For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

      So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.HBASE-4469

      ================================================

      After this change:
      For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
      For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
      For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

      For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
      For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
      For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

      So we can get about 10% more seek savings for ALL kinds of bloom filter.

      1. hbase-4532-remove-system.out.println.patch
        0.6 kB
        Jonathan Hsieh
      2. HBASE-4532-apache-trunk.patch
        68 kB
        Liyin Tang
      3. hbase-4532-89-fb.patch
        64 kB
        Liyin Tang
      4. ASF.LICENSE.NOT.GRANTED--D27.1.patch
        56 kB
        noreply@reviews.facebook.net
      5. ASF.LICENSE.NOT.GRANTED--D27.1.patch
        56 kB
        noreply@reviews.facebook.net

        Issue Links

          Activity

          Hide
          stack added a comment -

          I like this idea. Nice one Liyin.

          Show
          stack added a comment - I like this idea. Nice one Liyin.
          Hide
          Jonathan Gray added a comment -

          Whoo! +1

          Show
          Jonathan Gray added a comment - Whoo! +1
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/
          -----------------------------------------------------------

          Review request for hbase, Dhruba Borthakur, Michael Stack, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary
          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.
          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for row with empty column.

          Previous solution is to create the dedicated bloom filter for delete family, which does not work if there is a row with empty column.
          For example,
          we are interested in row1/cf1:/1/put.
          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
          In this way, we have already missed the real kv we are interested in.

          The root cause is that even there is no delete family at top row, we still cannot avoid the top row seek.
          We can ONLY avoid the top row seek when there is no row with empty column, no matter what kind of kv type (delete/deleteCol/deleteFamily/put).
          So the current solution is to create the dedicate bloom filter for row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.
          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs


          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb
          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java a1d7de5
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812
          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java c88b23f
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing
          -------

          Running all the unit tests now

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- Review request for hbase, Dhruba Borthakur, Michael Stack, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin, Karthik Ranganathan, and Nicolas Spiegelberg. Summary ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for row with empty column. Previous solution is to create the dedicated bloom filter for delete family, which does not work if there is a row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The root cause is that even there is no delete family at top row, we still cannot avoid the top row seek. We can ONLY avoid the top row seek when there is no row with empty column, no matter what kind of kv type (delete/deleteCol/deleteFamily/put). So the current solution is to create the dedicate bloom filter for row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java a1d7de5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java c88b23f src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Running all the unit tests now Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/
          -----------------------------------------------------------

          (Updated 2011-10-15 18:31:25.852446)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary (updated)
          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.
          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for row with empty column.

          Previous solution is to create the dedicated bloom filter for delete family, which does not work if there is a row with empty column.
          For example,
          we are interested in row1/cf1:/1/put.
          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
          In this way, we have already missed the real kv we are interested in.

          The root cause is that even there is no delete family at top row, we still cannot avoid the top row seek.
          We can ONLY avoid the top row seek when there is no row with empty column, no matter what kind of kv type (delete/deleteCol/deleteFamily/put).
          So the current solution is to create the dedicate bloom filter for row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.
          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs


          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb
          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java a1d7de5
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812
          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java c88b23f
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing
          -------

          Running all the unit tests now

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-15 18:31:25.852446) Review request for hbase, Dhruba Borthakur, Michael Stack, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin, Karthik Ranganathan, and Nicolas Spiegelberg. Summary (updated) ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for row with empty column. Previous solution is to create the dedicated bloom filter for delete family, which does not work if there is a row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The root cause is that even there is no delete family at top row, we still cannot avoid the top row seek. We can ONLY avoid the top row seek when there is no row with empty column, no matter what kind of kv type (delete/deleteCol/deleteFamily/put). So the current solution is to create the dedicate bloom filter for row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java a1d7de5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java c88b23f src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Running all the unit tests now Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/
          -----------------------------------------------------------

          (Updated 2011-10-15 18:33:14.858552)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary (updated)
          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.
          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,
          we are interested in row1/cf1:/1/put.
          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.
          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs


          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb
          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java a1d7de5
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812
          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java c88b23f
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing
          -------

          Running all the unit tests now

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-15 18:33:14.858552) Review request for hbase, Dhruba Borthakur, Michael Stack, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin, Karthik Ranganathan, and Nicolas Spiegelberg. Summary (updated) ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java a1d7de5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java c88b23f src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Running all the unit tests now Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/
          -----------------------------------------------------------

          (Updated 2011-10-15 19:06:53.079380)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary
          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.
          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,
          we are interested in row1/cf1:/1/put.
          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.
          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs


          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb
          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java a1d7de5
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812
          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java c88b23f
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing
          -------

          Running all the unit tests now

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-15 19:06:53.079380) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin, Karthik Ranganathan, and Nicolas Spiegelberg. Summary ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java a1d7de5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java c88b23f src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Running all the unit tests now Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/
          -----------------------------------------------------------

          (Updated 2011-10-18 20:38:41.990053)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary (updated)
          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.
          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,
          we are interested in row1/cf1:/1/put.
          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.
          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs


          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb
          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java a1d7de5
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812
          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java c88b23f
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing
          -------

          Running all the unit tests now

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-18 20:38:41.990053) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary (updated) ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java a1d7de5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java c88b23f src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Running all the unit tests now Thanks, Liyin
          Hide
          noreply@reviews.facebook.net added a comment -

          Liyin requested code review of "[jira] HBASE-4532 Avoid top row seek by dedicated bloom filter for delete family bloom filter".
          Reviewers: DUMMY_REVIEWER

          hbase 4532 89

          <a href="https://issues.apache.org/jira/browse/HBASE-4469" title="Avoid top row seek by looking up bloomfilter"><del>HBASE-4469</del></a> avoids the top row seek operation if row-col bloom filter is enabled.
          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,
          we are interested in row1/cf1:/1/put.
          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          TEST PLAN
          EMPTY

          REVISION DETAIL
          http://reviews.facebook.net/D27

          AFFECTED FILES
          src/main/java/org/apache/hadoop/hbase/KeyValue.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java

          MANAGE HERALD DIFFERENTIAL RULES
          http://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          http://reviews.facebook.net/herald/transcript/57/

          Tip: use the X-Herald-Rules header to filter Herald messages in your client.

          Show
          noreply@reviews.facebook.net added a comment - Liyin requested code review of " [jira] HBASE-4532 Avoid top row seek by dedicated bloom filter for delete family bloom filter". Reviewers: DUMMY_REVIEWER hbase 4532 89 <a href="https://issues.apache.org/jira/browse/HBASE-4469" title="Avoid top row seek by looking up bloomfilter"><del> HBASE-4469 </del></a> avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. TEST PLAN EMPTY REVISION DETAIL http://reviews.facebook.net/D27 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/KeyValue.java src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java MANAGE HERALD DIFFERENTIAL RULES http://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? http://reviews.facebook.net/herald/transcript/57/ Tip: use the X-Herald-Rules header to filter Herald messages in your client.
          Hide
          noreply@reviews.facebook.net added a comment -

          Liyin requested code review of "[jira] HBASE-4532 Avoid top row seek by dedicated bloom filter for delete family bloom filter".
          Reviewers: DUMMY_REVIEWER

          hbase 4532 89

          <a href="https://issues.apache.org/jira/browse/HBASE-4469" title="Avoid top row seek by looking up bloomfilter"><del>HBASE-4469</del></a> avoids the top row seek operation if row-col bloom filter is enabled.
          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,
          we are interested in row1/cf1:/1/put.
          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          TEST PLAN
          EMPTY

          REVISION DETAIL
          http://reviews.facebook.net/D27

          AFFECTED FILES
          src/main/java/org/apache/hadoop/hbase/KeyValue.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java

          MANAGE HERALD DIFFERENTIAL RULES
          http://reviews.facebook.net/herald/view/differential/

          WHY DID I GET THIS EMAIL?
          http://reviews.facebook.net/herald/transcript/57/

          Tip: use the X-Herald-Rules header to filter Herald messages in your client.

          Show
          noreply@reviews.facebook.net added a comment - Liyin requested code review of " [jira] HBASE-4532 Avoid top row seek by dedicated bloom filter for delete family bloom filter". Reviewers: DUMMY_REVIEWER hbase 4532 89 <a href="https://issues.apache.org/jira/browse/HBASE-4469" title="Avoid top row seek by looking up bloomfilter"><del> HBASE-4469 </del></a> avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. TEST PLAN EMPTY REVISION DETAIL http://reviews.facebook.net/D27 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/KeyValue.java src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java MANAGE HERALD DIFFERENTIAL RULES http://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? http://reviews.facebook.net/herald/transcript/57/ Tip: use the X-Herald-Rules header to filter Herald messages in your client.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/#review2615
          -----------------------------------------------------------

          Hi Liyin-- find the first pass review comments inlined. Haven't reviewed the test changes yet. Looking fwd to this optimization landing.

          src/main/java/org/apache/hadoop/hbase/KeyValue.java
          <https://reviews.apache.org/r/2393/#comment5908>

          remove "qualifier" from the comment, since all we are passing here is row and family (no column name).

          src/main/java/org/apache/hadoop/hbase/KeyValue.java
          <https://reviews.apache.org/r/2393/#comment5909>

          • remove qualifier from comment here too.
          • 80 char issue

          src/main/java/org/apache/hadoop/hbase/KeyValue.java
          <https://reviews.apache.org/r/2393/#comment5910>

          remove qualifier param.

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          <https://reviews.apache.org/r/2393/#comment5913>

          80 char issues.

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
          <https://reviews.apache.org/r/2393/#comment5914>

          can we enhance HFilePrettyPrinter to report info about the DeleteBloomFilter as well (provided HFile is V2).

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          <https://reviews.apache.org/r/2393/#comment5985>

          even though -> even if

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          <https://reviews.apache.org/r/2393/#comment5987>

          what's the differerence between getPath() (in line 1027) and this.writer.getPath()? Did you mean to log the general & delete Bloom filter instead?

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          <https://reviews.apache.org/r/2393/#comment6015>

          not clear where you are using this "-1" state

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          <https://reviews.apache.org/r/2393/#comment6017>

          or this is no -> or there is no

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          <https://reviews.apache.org/r/2393/#comment6016>

          To make sure I understand this...

          for "HFileV1" case or for "HFileV2 + but without this fix", I am guessing deleteFamilyCnt will be equal to -1, and the fact that it doesn't have a bloomFilter will cause it to return true. That look's fine. Just not obvious.

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          <https://reviews.apache.org/r/2393/#comment6014>

          space between cnt & !=

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          <https://reviews.apache.org/r/2393/#comment6020>

          did you intend to initialize bloomTypeLog here as well?

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          <https://reviews.apache.org/r/2393/#comment6021>

          bloomTypeLog is only initialized for GeneralBloomFilter case. If that's the intent, why not move the logging near line 1382?

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
          <https://reviews.apache.org/r/2393/#comment6027>

          In case there is a deleteFamily kv, there are two sub-cases here...

          a) we have ROWCOL bloom (in which case there is no DeleteFamilyBloomFilter) and we want to use the ROWCOL bloom filter itself.

          b) we have a DeleteFamilyBloomFilter.

          I don't see us taking advantage of (a) like we used to earlier. Isn't this a regression for the ROWCOL bloom case? And if so, TestBlocksRead should have caught it, no?

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
          <https://reviews.apache.org/r/2393/#comment6023>

          isSeekToEmptyColumn and useBloom should be separate flags I think.

          For example, if the CF had ROWCOL bloom, and the query for looking for "row/0-length column", then with this change, we won't use the ROWCOL bloom filter even when it exists.

          Isn't it the case that we want to avoid using only the deleteFamilyBloom filter when isSeekToEmptyColumn is true?

          • Kannan

          On 2011-10-18 20:38:41, Liyin Tang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2393/

          -----------------------------------------------------------

          (Updated 2011-10-18 20:38:41)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary

          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.

          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,

          we are interested in row1/cf1:/1/put.

          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.

          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).

          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.

          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb

          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java a1d7de5

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812

          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java c88b23f

          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163

          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing

          -------

          Running all the unit tests now

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2615 ----------------------------------------------------------- Hi Liyin-- find the first pass review comments inlined. Haven't reviewed the test changes yet. Looking fwd to this optimization landing. src/main/java/org/apache/hadoop/hbase/KeyValue.java < https://reviews.apache.org/r/2393/#comment5908 > remove "qualifier" from the comment, since all we are passing here is row and family (no column name). src/main/java/org/apache/hadoop/hbase/KeyValue.java < https://reviews.apache.org/r/2393/#comment5909 > remove qualifier from comment here too. 80 char issue src/main/java/org/apache/hadoop/hbase/KeyValue.java < https://reviews.apache.org/r/2393/#comment5910 > remove qualifier param. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java < https://reviews.apache.org/r/2393/#comment5913 > 80 char issues. src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java < https://reviews.apache.org/r/2393/#comment5914 > can we enhance HFilePrettyPrinter to report info about the DeleteBloomFilter as well (provided HFile is V2). src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < https://reviews.apache.org/r/2393/#comment5985 > even though -> even if src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < https://reviews.apache.org/r/2393/#comment5987 > what's the differerence between getPath() (in line 1027) and this.writer.getPath()? Did you mean to log the general & delete Bloom filter instead? src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < https://reviews.apache.org/r/2393/#comment6015 > not clear where you are using this "-1" state src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < https://reviews.apache.org/r/2393/#comment6017 > or this is no -> or there is no src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < https://reviews.apache.org/r/2393/#comment6016 > To make sure I understand this... for "HFileV1" case or for "HFileV2 + but without this fix", I am guessing deleteFamilyCnt will be equal to -1, and the fact that it doesn't have a bloomFilter will cause it to return true. That look's fine. Just not obvious. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < https://reviews.apache.org/r/2393/#comment6014 > space between cnt & != src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < https://reviews.apache.org/r/2393/#comment6020 > did you intend to initialize bloomTypeLog here as well? src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < https://reviews.apache.org/r/2393/#comment6021 > bloomTypeLog is only initialized for GeneralBloomFilter case. If that's the intent, why not move the logging near line 1382? src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java < https://reviews.apache.org/r/2393/#comment6027 > In case there is a deleteFamily kv, there are two sub-cases here... a) we have ROWCOL bloom (in which case there is no DeleteFamilyBloomFilter) and we want to use the ROWCOL bloom filter itself. b) we have a DeleteFamilyBloomFilter. I don't see us taking advantage of (a) like we used to earlier. Isn't this a regression for the ROWCOL bloom case? And if so, TestBlocksRead should have caught it, no? src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java < https://reviews.apache.org/r/2393/#comment6023 > isSeekToEmptyColumn and useBloom should be separate flags I think. For example, if the CF had ROWCOL bloom, and the query for looking for "row/0-length column", then with this change, we won't use the ROWCOL bloom filter even when it exists. Isn't it the case that we want to avoid using only the deleteFamilyBloom filter when isSeekToEmptyColumn is true? Kannan On 2011-10-18 20:38:41, Liyin Tang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-18 20:38:41) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java a1d7de5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java c88b23f src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Running all the unit tests now Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/
          -----------------------------------------------------------

          (Updated 2011-10-20 00:08:14.459108)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Changes
          -------

          Thanks for Kannan's review.
          Update the diff to address Kannan's comments.

          Summary
          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.
          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,
          we are interested in row1/cf1:/1/put.
          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.
          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb
          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812
          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing
          -------

          Running all the unit tests now

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-20 00:08:14.459108) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Changes ------- Thanks for Kannan's review. Update the diff to address Kannan's comments. Summary ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs (updated) src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Running all the unit tests now Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 1058

          > <https://reviews.apache.org/r/2393/diff/2/?file=50558#file50558line1058>

          >

          > not clear where you are using this "-1" state

          Even if there is no delete family bloom filter, the Store file will still count how many delete family key values and append this information into HFile's File info.
          So when reading the file, we will know how many delete family kvs.

          However, if there is no this delete family field in the file info, deleteFamilyCnt shall be set to -1. So the function passesDeleteFamilyBloomFilter won't take this into account.

          On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 1217

          > <https://reviews.apache.org/r/2393/diff/2/?file=50558#file50558line1217>

          >

          > To make sure I understand this...

          >

          > for "HFileV1" case or for "HFileV2 + but without this fix", I am guessing deleteFamilyCnt will be equal to -1, and the fact that it doesn't have a bloomFilter will cause it to return true. That look's fine. Just not obvious.

          Yes If there is a deleteFamilyCnt and the deleteFamilyCnt is 0, then there is no need to check Bloom filter and return false for function passesDeleteFamilyBloomFilter(). It means there is no need to seek this store file for delete family with the row.

          if the deleteFamilyCnt is not initialized properly for some reason, which is set to -1, then it needs to check the delete family bloom filter.
          So there is no delete family bloom filter, it will return true. It means it is possible that there is a delete family for this row.

          On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java, line 238

          > <https://reviews.apache.org/r/2393/diff/2/?file=50559#file50559line238>

          >

          > In case there is a deleteFamily kv, there are two sub-cases here...

          >

          > a) we have ROWCOL bloom (in which case there is no DeleteFamilyBloomFilter) and we want to use the ROWCOL bloom filter itself.

          >

          > b) we have a DeleteFamilyBloomFilter.

          >

          > I don't see us taking advantage of (a) like we used to earlier. Isn't this a regression for the ROWCOL bloom case? And if so, TestBlocksRead should have caught it, no?

          1) Yes, it should the ROWCOL Bloom filter. It can also help to warm up row col bloom filter in the cache OR get benefit from block cache.
          I will update the code.
          2) There is no regression for the ROWCOL bloom case. It is because we only count for data block seek number.
          No matter which bloom filter (delete family or row col), it will return the same result. So it won't affect the decision whether to seek to the store file file or not.
          Please correct me if I am wrong

          On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java, line 111

          > <https://reviews.apache.org/r/2393/diff/2/?file=50560#file50560line111>

          >

          > isSeekToEmptyColumn and useBloom should be separate flags I think.

          >

          > For example, if the CF had ROWCOL bloom, and the query for looking for "row/0-length column", then with this change, we won't use the ROWCOL bloom filter even when it exists.

          >

          > Isn't it the case that we want to avoid using only the deleteFamilyBloom filter when isSeekToEmptyColumn is true?

          Agree I will update the code to pass the scan query matcher to each store file scanner. Also this will help us for further optimization.
          When the store file scanner has more information about the matcher's status, it may help to avoid more unnecessarily seeks.

          • Liyin

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/#review2615
          -----------------------------------------------------------

          On 2011-10-20 00:08:14, Liyin Tang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2393/

          -----------------------------------------------------------

          (Updated 2011-10-20 00:08:14)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary

          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.

          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,

          we are interested in row1/cf1:/1/put.

          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.

          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).

          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.

          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb

          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812

          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65

          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163

          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing

          -------

          Running all the unit tests now

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 1058 > < https://reviews.apache.org/r/2393/diff/2/?file=50558#file50558line1058 > > > not clear where you are using this "-1" state Even if there is no delete family bloom filter, the Store file will still count how many delete family key values and append this information into HFile's File info. So when reading the file, we will know how many delete family kvs. However, if there is no this delete family field in the file info, deleteFamilyCnt shall be set to -1. So the function passesDeleteFamilyBloomFilter won't take this into account. On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 1217 > < https://reviews.apache.org/r/2393/diff/2/?file=50558#file50558line1217 > > > To make sure I understand this... > > for "HFileV1" case or for "HFileV2 + but without this fix", I am guessing deleteFamilyCnt will be equal to -1, and the fact that it doesn't have a bloomFilter will cause it to return true. That look's fine. Just not obvious. Yes If there is a deleteFamilyCnt and the deleteFamilyCnt is 0, then there is no need to check Bloom filter and return false for function passesDeleteFamilyBloomFilter(). It means there is no need to seek this store file for delete family with the row. if the deleteFamilyCnt is not initialized properly for some reason, which is set to -1, then it needs to check the delete family bloom filter. So there is no delete family bloom filter, it will return true. It means it is possible that there is a delete family for this row. On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java, line 238 > < https://reviews.apache.org/r/2393/diff/2/?file=50559#file50559line238 > > > In case there is a deleteFamily kv, there are two sub-cases here... > > a) we have ROWCOL bloom (in which case there is no DeleteFamilyBloomFilter) and we want to use the ROWCOL bloom filter itself. > > b) we have a DeleteFamilyBloomFilter. > > I don't see us taking advantage of (a) like we used to earlier. Isn't this a regression for the ROWCOL bloom case? And if so, TestBlocksRead should have caught it, no? 1) Yes, it should the ROWCOL Bloom filter. It can also help to warm up row col bloom filter in the cache OR get benefit from block cache. I will update the code. 2) There is no regression for the ROWCOL bloom case. It is because we only count for data block seek number. No matter which bloom filter (delete family or row col), it will return the same result. So it won't affect the decision whether to seek to the store file file or not. Please correct me if I am wrong On 2011-10-19 19:02:46, Kannan Muthukkaruppan wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java, line 111 > < https://reviews.apache.org/r/2393/diff/2/?file=50560#file50560line111 > > > isSeekToEmptyColumn and useBloom should be separate flags I think. > > For example, if the CF had ROWCOL bloom, and the query for looking for "row/0-length column", then with this change, we won't use the ROWCOL bloom filter even when it exists. > > Isn't it the case that we want to avoid using only the deleteFamilyBloom filter when isSeekToEmptyColumn is true? Agree I will update the code to pass the scan query matcher to each store file scanner. Also this will help us for further optimization. When the store file scanner has more information about the matcher's status, it may help to avoid more unnecessarily seeks. Liyin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2615 ----------------------------------------------------------- On 2011-10-20 00:08:14, Liyin Tang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-20 00:08:14) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Running all the unit tests now Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/
          -----------------------------------------------------------

          (Updated 2011-10-20 03:46:26.190655)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary
          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.
          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,
          we are interested in row1/cf1:/1/put.
          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.
          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs


          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb
          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812
          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing (updated)
          -------

          Passed all the unit tests

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-20 03:46:26.190655) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing (updated) ------- Passed all the unit tests Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/#review2693
          -----------------------------------------------------------

          Nice work Liyin! I've been wanting this feature for so long! Some minor comments but looks good.

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          <https://reviews.apache.org/r/2393/#comment6069>

          is this right to return IOE and not null like if it doesn't exist in the general bloom case?

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          <https://reviews.apache.org/r/2393/#comment6070>

          missing leading space

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          <https://reviews.apache.org/r/2393/#comment6071>

          can you describe what an empty column means? does this mean wildcard or does this mean the null column?

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          <https://reviews.apache.org/r/2393/#comment6072>

          this means the null qualifier?

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          <https://reviews.apache.org/r/2393/#comment6073>

          is there ever a case someone would not want this turned on? if someone was doing a ton of delete families maybe? u might not want to pay the cost of making this bloom.

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          <https://reviews.apache.org/r/2393/#comment6074>

          flipped args

          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
          <https://reviews.apache.org/r/2393/#comment6076>

          this enables the creation or the usage?

          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
          <https://reviews.apache.org/r/2393/#comment6075>

          stale comment from general method

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
          <https://reviews.apache.org/r/2393/#comment6077>

          it's hard to tell what in this actually changed. i don't see much that actually went down? and should you also do some tests where you enable/disable the delete family bloom to ensure that it's working as expected both ways?

          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          <https://reviews.apache.org/r/2393/#comment6078>

          you seem to be setting the conf to 0.01 and then retrieving it back?

          • Jonathan

          On 2011-10-20 03:46:26, Liyin Tang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2393/

          -----------------------------------------------------------

          (Updated 2011-10-20 03:46:26)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary

          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.

          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,

          we are interested in row1/cf1:/1/put.

          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.

          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).

          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.

          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb

          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812

          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65

          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163

          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing

          -------

          Passed all the unit tests

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2693 ----------------------------------------------------------- Nice work Liyin! I've been wanting this feature for so long! Some minor comments but looks good. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java < https://reviews.apache.org/r/2393/#comment6069 > is this right to return IOE and not null like if it doesn't exist in the general bloom case? src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java < https://reviews.apache.org/r/2393/#comment6070 > missing leading space src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java < https://reviews.apache.org/r/2393/#comment6071 > can you describe what an empty column means? does this mean wildcard or does this mean the null column? src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java < https://reviews.apache.org/r/2393/#comment6072 > this means the null qualifier? src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < https://reviews.apache.org/r/2393/#comment6073 > is there ever a case someone would not want this turned on? if someone was doing a ton of delete families maybe? u might not want to pay the cost of making this bloom. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < https://reviews.apache.org/r/2393/#comment6074 > flipped args src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java < https://reviews.apache.org/r/2393/#comment6076 > this enables the creation or the usage? src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java < https://reviews.apache.org/r/2393/#comment6075 > stale comment from general method src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java < https://reviews.apache.org/r/2393/#comment6077 > it's hard to tell what in this actually changed. i don't see much that actually went down? and should you also do some tests where you enable/disable the delete family bloom to ensure that it's working as expected both ways? src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java < https://reviews.apache.org/r/2393/#comment6078 > you seem to be setting the conf to 0.01 and then retrieving it back? Jonathan On 2011-10-20 03:46:26, Liyin Tang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-20 03:46:26) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Passed all the unit tests Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/#review2695
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          <https://reviews.apache.org/r/2393/#comment6079>

          If hasEmptyColumn is true, shall we be using ScanWildcardColumnTracker (as in the if block) ?

          • Ted

          On 2011-10-20 03:46:26, Liyin Tang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2393/

          -----------------------------------------------------------

          (Updated 2011-10-20 03:46:26)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary

          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.

          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,

          we are interested in row1/cf1:/1/put.

          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.

          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).

          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.

          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb

          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812

          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65

          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163

          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing

          -------

          Passed all the unit tests

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2695 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java < https://reviews.apache.org/r/2393/#comment6079 > If hasEmptyColumn is true, shall we be using ScanWildcardColumnTracker (as in the if block) ? Ted On 2011-10-20 03:46:26, Liyin Tang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-20 03:46:26) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Passed all the unit tests Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-10-20 04:55:44, Jonathan Gray wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 792

          > <https://reviews.apache.org/r/2393/diff/3/?file=51375#file51375line792>

          >

          > is there ever a case someone would not want this turned on? if someone was doing a ton of delete families maybe? u might not want to pay the cost of making this bloom.

          Yes, We can disable this by conf.setBoolean(IO_STOREFILE_DELETEFAMILY_BLOOM_ENABLED, false);

          On 2011-10-20 04:55:44, Jonathan Gray wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, line 98

          > <https://reviews.apache.org/r/2393/diff/3/?file=51374#file51374line98>

          >

          > this means the null qualifier?

          yes. I have updated the comments

          On 2011-10-20 04:55:44, Jonathan Gray wrote:

          > src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java, lines 677-678

          > <https://reviews.apache.org/r/2393/diff/3/?file=51370#file51370line677>

          >

          > is this right to return IOE and not null like if it doesn't exist in the general bloom case?

          agreed

          On 2011-10-20 04:55:44, Jonathan Gray wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, lines 67-68

          > <https://reviews.apache.org/r/2393/diff/3/?file=51374#file51374line67>

          >

          > can you describe what an empty column means? does this mean wildcard or does this mean the null column?

          yes. I have updated the comments

          On 2011-10-20 04:55:44, Jonathan Gray wrote:

          > src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java, line 73

          > <https://reviews.apache.org/r/2393/diff/3/?file=51378#file51378line73>

          >

          > this enables the creation or the usage?

          This enable the creation.

          On 2011-10-20 04:55:44, Jonathan Gray wrote:

          > src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java, line 26

          > <https://reviews.apache.org/r/2393/diff/3/?file=51379#file51379line26>

          >

          > it's hard to tell what in this actually changed. i don't see much that actually went down? and should you also do some tests where you enable/disable the delete family bloom to ensure that it's working as expected both ways?

          It expects no number goes down It shows we can avoid the top row seek even there is ROW/NONE bloom filter.

          Previously, this unit test only enabled the ROWCOL bloom filter for HBASE-4469 (Avoid top row seek by looking up row_col bloomfilter)

          But right now, in the TestBlocksRead, it will check the number seeks for ROWCOL, ROW and NONE Bloom filter one by one.
          No matter what Bloom filter the CF is using, we always avoid the top row seek

          On 2011-10-20 04:55:44, Jonathan Gray wrote:

          > src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java, line 401

          > <https://reviews.apache.org/r/2393/diff/3/?file=51381#file51381line401>

          >

          > you seem to be setting the conf to 0.01 and then retrieving it back?

          Yes. I try to be consistent with other bloom filter unit tests.
          So set the same error rate as testBloomFilter() function.

          • Liyin

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/#review2693
          -----------------------------------------------------------

          On 2011-10-20 03:46:26, Liyin Tang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2393/

          -----------------------------------------------------------

          (Updated 2011-10-20 03:46:26)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary

          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.

          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,

          we are interested in row1/cf1:/1/put.

          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.

          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).

          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.

          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb

          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812

          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65

          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163

          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing

          -------

          Passed all the unit tests

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-10-20 04:55:44, Jonathan Gray wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 792 > < https://reviews.apache.org/r/2393/diff/3/?file=51375#file51375line792 > > > is there ever a case someone would not want this turned on? if someone was doing a ton of delete families maybe? u might not want to pay the cost of making this bloom. Yes, We can disable this by conf.setBoolean(IO_STOREFILE_DELETEFAMILY_BLOOM_ENABLED, false); On 2011-10-20 04:55:44, Jonathan Gray wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, line 98 > < https://reviews.apache.org/r/2393/diff/3/?file=51374#file51374line98 > > > this means the null qualifier? yes. I have updated the comments On 2011-10-20 04:55:44, Jonathan Gray wrote: > src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java, lines 677-678 > < https://reviews.apache.org/r/2393/diff/3/?file=51370#file51370line677 > > > is this right to return IOE and not null like if it doesn't exist in the general bloom case? agreed On 2011-10-20 04:55:44, Jonathan Gray wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, lines 67-68 > < https://reviews.apache.org/r/2393/diff/3/?file=51374#file51374line67 > > > can you describe what an empty column means? does this mean wildcard or does this mean the null column? yes. I have updated the comments On 2011-10-20 04:55:44, Jonathan Gray wrote: > src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java, line 73 > < https://reviews.apache.org/r/2393/diff/3/?file=51378#file51378line73 > > > this enables the creation or the usage? This enable the creation. On 2011-10-20 04:55:44, Jonathan Gray wrote: > src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java, line 26 > < https://reviews.apache.org/r/2393/diff/3/?file=51379#file51379line26 > > > it's hard to tell what in this actually changed. i don't see much that actually went down? and should you also do some tests where you enable/disable the delete family bloom to ensure that it's working as expected both ways? It expects no number goes down It shows we can avoid the top row seek even there is ROW/NONE bloom filter. Previously, this unit test only enabled the ROWCOL bloom filter for HBASE-4469 (Avoid top row seek by looking up row_col bloomfilter) But right now, in the TestBlocksRead, it will check the number seeks for ROWCOL, ROW and NONE Bloom filter one by one. No matter what Bloom filter the CF is using, we always avoid the top row seek On 2011-10-20 04:55:44, Jonathan Gray wrote: > src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java, line 401 > < https://reviews.apache.org/r/2393/diff/3/?file=51381#file51381line401 > > > you seem to be setting the conf to 0.01 and then retrieving it back? Yes. I try to be consistent with other bloom filter unit tests. So set the same error rate as testBloomFilter() function. Liyin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2693 ----------------------------------------------------------- On 2011-10-20 03:46:26, Liyin Tang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-20 03:46:26) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Passed all the unit tests Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-10-20 05:07:00, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, line 98

          > <https://reviews.apache.org/r/2393/diff/3/?file=51374#file51374line98>

          >

          > If hasEmptyColumn is true, shall we be using ScanWildcardColumnTracker (as in the if block) ?

          I have updated comments for this variable:

          • This variable shows whether there is an null column in the query.
          • There is always a null column in the wildcard column query.
          • There maybe exits a null column in the explicit column query based on the
          • first column.
          • Liyin

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/#review2695
          -----------------------------------------------------------

          On 2011-10-20 03:46:26, Liyin Tang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2393/

          -----------------------------------------------------------

          (Updated 2011-10-20 03:46:26)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary

          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.

          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,

          we are interested in row1/cf1:/1/put.

          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.

          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).

          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.

          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb

          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812

          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65

          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163

          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing

          -------

          Passed all the unit tests

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-10-20 05:07:00, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, line 98 > < https://reviews.apache.org/r/2393/diff/3/?file=51374#file51374line98 > > > If hasEmptyColumn is true, shall we be using ScanWildcardColumnTracker (as in the if block) ? I have updated comments for this variable: This variable shows whether there is an null column in the query. There is always a null column in the wildcard column query. There maybe exits a null column in the explicit column query based on the first column. Liyin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2695 ----------------------------------------------------------- On 2011-10-20 03:46:26, Liyin Tang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-20 03:46:26) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Passed all the unit tests Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/
          -----------------------------------------------------------

          (Updated 2011-10-20 17:35:38.480668)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Changes
          -------

          Thanks Jonathan and Ted's review. Address their comments.

          Summary
          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.
          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,
          we are interested in row1/cf1:/1/put.
          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.
          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs (updated)


          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812
          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce
          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb
          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518

          Diff: https://reviews.apache.org/r/2393/diff

          Testing
          -------

          Passed all the unit tests

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-20 17:35:38.480668) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Changes ------- Thanks Jonathan and Ted's review. Address their comments. Summary ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs (updated) src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Passed all the unit tests Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/
          -----------------------------------------------------------

          (Updated 2011-10-20 18:45:51.242661)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary
          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.
          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,
          we are interested in row1/cf1:/1/put.
          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.
          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3
          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb
          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812
          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing
          -------

          Passed all the unit tests

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-20 18:45:51.242661) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs (updated) src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Passed all the unit tests Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/#review2720
          -----------------------------------------------------------

          Ship it!

          Looks good! Want to attach the various branch patches to the JIRA? (As you have them). I can help rebase on trunk if you'd like, let me know. Nice work!

          • Jonathan

          On 2011-10-20 18:45:51, Liyin Tang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2393/

          -----------------------------------------------------------

          (Updated 2011-10-20 18:45:51)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary

          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.

          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,

          we are interested in row1/cf1:/1/put.

          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.

          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).

          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.

          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3

          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb

          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812

          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65

          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163

          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing

          -------

          Passed all the unit tests

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2720 ----------------------------------------------------------- Ship it! Looks good! Want to attach the various branch patches to the JIRA? (As you have them). I can help rebase on trunk if you'd like, let me know. Nice work! Jonathan On 2011-10-20 18:45:51, Liyin Tang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-20 18:45:51) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Passed all the unit tests Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-10-20 19:47:02, Jonathan Gray wrote:

          > Looks good! Want to attach the various branch patches to the JIRA? (As you have them). I can help rebase on trunk if you'd like, let me know. Nice work!

          thanks Jonathan for reviewing. I will submit a patch against apache trunk

          • Liyin

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/#review2720
          -----------------------------------------------------------

          On 2011-10-20 18:45:51, Liyin Tang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2393/

          -----------------------------------------------------------

          (Updated 2011-10-20 18:45:51)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary

          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.

          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,

          we are interested in row1/cf1:/1/put.

          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.

          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).

          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.

          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3

          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb

          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812

          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65

          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163

          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing

          -------

          Passed all the unit tests

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-10-20 19:47:02, Jonathan Gray wrote: > Looks good! Want to attach the various branch patches to the JIRA? (As you have them). I can help rebase on trunk if you'd like, let me know. Nice work! thanks Jonathan for reviewing. I will submit a patch against apache trunk Liyin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2720 ----------------------------------------------------------- On 2011-10-20 18:45:51, Liyin Tang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-20 18:45:51) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Passed all the unit tests Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/#review2734
          -----------------------------------------------------------

          done with 2nd pass of review. Final comments inlined. This is looking really good.

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
          <https://reviews.apache.org/r/2393/#comment6151>

          getGeneralBloom -> getDeleteBloom?

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          <https://reviews.apache.org/r/2393/#comment6152>

          exits -> exist

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          <https://reviews.apache.org/r/2393/#comment6154>

          tomorrow someone may introduce a new constructor which forgets to initialize this variable, and it'll default to false, which is an unsafe default for this variable.

          Let's do something like:

          private boolean hasNullColumn = true; // initialize conservatively

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          <https://reviews.apache.org/r/2393/#comment6153>

          simplify to:

          hasNullColumn = (columnSet.first().length == 0);

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          <https://reviews.apache.org/r/2393/#comment6155>

          has -> was

          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
          <https://reviews.apache.org/r/2393/#comment6157>

          This function only does DeleteFamily Bloom. So, the comment needs to be updated to remove ROWCOL & ROW.

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
          <https://reviews.apache.org/r/2393/#comment6158>

          nice enhancement to the test!

          • Kannan

          On 2011-10-20 18:45:51, Liyin Tang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2393/

          -----------------------------------------------------------

          (Updated 2011-10-20 18:45:51)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary

          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.

          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,

          we are interested in row1/cf1:/1/put.

          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.

          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).

          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.

          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3

          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb

          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812

          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65

          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163

          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing

          -------

          Passed all the unit tests

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2734 ----------------------------------------------------------- done with 2nd pass of review. Final comments inlined. This is looking really good. src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java < https://reviews.apache.org/r/2393/#comment6151 > getGeneralBloom -> getDeleteBloom? src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java < https://reviews.apache.org/r/2393/#comment6152 > exits -> exist src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java < https://reviews.apache.org/r/2393/#comment6154 > tomorrow someone may introduce a new constructor which forgets to initialize this variable, and it'll default to false, which is an unsafe default for this variable. Let's do something like: private boolean hasNullColumn = true; // initialize conservatively src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java < https://reviews.apache.org/r/2393/#comment6153 > simplify to: hasNullColumn = (columnSet.first().length == 0); src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < https://reviews.apache.org/r/2393/#comment6155 > has -> was src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java < https://reviews.apache.org/r/2393/#comment6157 > This function only does DeleteFamily Bloom. So, the comment needs to be updated to remove ROWCOL & ROW. src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java < https://reviews.apache.org/r/2393/#comment6158 > nice enhancement to the test! Kannan On 2011-10-20 18:45:51, Liyin Tang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-20 18:45:51) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Passed all the unit tests Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/
          -----------------------------------------------------------

          (Updated 2011-10-21 19:49:16.120589)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Changes
          -------

          Thanks Kannan Update the diff to address Kannan's review

          Summary
          -------

          HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled.
          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,
          we are interested in row1/cf1:/1/put.
          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later.

          This addresses bug HBASE-4532.
          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb
          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812
          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing
          -------

          Passed all the unit tests

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-21 19:49:16.120589) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Changes ------- Thanks Kannan Update the diff to address Kannan's review Summary ------- HBASE-4469 avoids the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. This patch is rebased on 0.89-fb. But it should be the same for apache-trunk as well. I will submit the patch for apache-trunk later. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs (updated) src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Passed all the unit tests Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/
          -----------------------------------------------------------

          (Updated 2011-10-21 19:58:05.693922)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary (updated)
          -------

          The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled.
          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,
          we are interested in row1/cf1:/1/put.
          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          Evaluation from TestSeekOptimization:
          Previously:
          For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
          For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
          For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
          For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
          For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.HBASE-4469

          ================================================

          After this change:
          For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
          For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
          For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
          For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
          For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          So we can get about 10% more seek savings for ALL kinds of bloom filter.

          This addresses bug HBASE-4532.
          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs


          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb
          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812
          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing
          -------

          Passed all the unit tests

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-21 19:58:05.693922) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary (updated) ------- The previous jira, HBASE-4469 , is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled. HBASE-4469 ================================================ After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Passed all the unit tests Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/#review2758
          -----------------------------------------------------------

          Ship it!

          +1. One typo below.

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          <https://reviews.apache.org/r/2393/#comment6199>

          exits -> exist

          • Kannan

          On 2011-10-21 19:58:05, Liyin Tang wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2393/

          -----------------------------------------------------------

          (Updated 2011-10-21 19:58:05)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Summary

          -------

          The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled.

          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,

          we are interested in row1/cf1:/1/put.

          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.

          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).

          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          Evaluation from TestSeekOptimization:

          Previously:

          For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%

          For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%

          For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%

          For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%

          For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.HBASE-4469

          ================================================

          After this change:

          For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          So we can get about 10% more seek savings for ALL kinds of bloom filter.

          This addresses bug HBASE-4532.

          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs

          -----

          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb

          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9

          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c

          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812

          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65

          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163

          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing

          -------

          Passed all the unit tests

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/#review2758 ----------------------------------------------------------- Ship it! +1. One typo below. src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java < https://reviews.apache.org/r/2393/#comment6199 > exits -> exist Kannan On 2011-10-21 19:58:05, Liyin Tang wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-21 19:58:05) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Summary ------- The previous jira, HBASE-4469 , is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled. HBASE-4469 ================================================ After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Passed all the unit tests Thanks, Liyin
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2393/
          -----------------------------------------------------------

          (Updated 2011-10-21 20:01:23.160793)

          Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.

          Changes
          -------

          Fix the typo

          Summary
          -------

          The previous jira, HBASE-4469, is to avoid the top row seek operation if row-col bloom filter is enabled.
          This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family

          The only subtle use case is when we are interested in the top row with empty column.

          For example,
          we are interested in row1/cf1:/1/put.
          So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family.
          Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol).
          In this way, we have already missed the real kv we are interested in.

          The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column.

          Evaluation from TestSeekOptimization:
          Previously:
          For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
          For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
          For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
          For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
          For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled.HBASE-4469

          ================================================

          After this change:
          For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
          For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
          For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
          For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
          For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%

          So we can get about 10% more seek savings for ALL kinds of bloom filter.

          This addresses bug HBASE-4532.
          https://issues.apache.org/jira/browse/HBASE-4532

          Diffs (updated)


          src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb
          src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9
          src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c
          src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812
          src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df
          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65
          src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163
          src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8

          Diff: https://reviews.apache.org/r/2393/diff

          Testing
          -------

          Passed all the unit tests

          Thanks,

          Liyin

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2393/ ----------------------------------------------------------- (Updated 2011-10-21 20:01:23.160793) Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin, Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg. Changes ------- Fix the typo Summary ------- The previous jira, HBASE-4469 , is to avoid the top row seek operation if row-col bloom filter is enabled. This jira tries to avoid top row seek for all the cases by creating a dedicated bloom filter only for delete family The only subtle use case is when we are interested in the top row with empty column. For example, we are interested in row1/cf1:/1/put. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family bloom filter will say there is NO delete family. Then it will avoid the top row seek and return a fake kv, which is the last kv for this row (createLastOnRowCol). In this way, we have already missed the real kv we are interested in. The solution for the above problem is to disable this optimization if we are trying to GET/SCAN a row with empty column. Evaluation from TestSeekOptimization: Previously: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is enabled. HBASE-4469 ================================================ After this change: For bloom=NONE, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=NONE, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROW, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% So we can get about 10% more seek savings for ALL kinds of bloom filter. This addresses bug HBASE-4532 . https://issues.apache.org/jira/browse/HBASE-4532 Diffs (updated) src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518 src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 6cf7cce src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1f78dd4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 3c34f86 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 2e1d23a src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java c4b60e9 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 92070b3 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java e4dfc2e src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java ebb360c src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 8814812 src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java fb4f2df src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java b8bcc65 src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java 48e9163 src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 0eca9b8 Diff: https://reviews.apache.org/r/2393/diff Testing ------- Passed all the unit tests Thanks, Liyin
          Hide
          Liyin Tang added a comment -

          For 89-fb, all the unit tests are passed.
          For apache-trunk, there are 2 unit tests failed with and without my change:
          TestHCM and TestDistributedLogSpliting

          Show
          Liyin Tang added a comment - For 89-fb, all the unit tests are passed. For apache-trunk, there are 2 unit tests failed with and without my change: TestHCM and TestDistributedLogSpliting
          Hide
          Ted Yu added a comment -

          Thanks for running the test suites, Liyin.

          TRUNK build 2358 passed.
          Though I did see this in build 2357:
          https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2357/testReport/junit/org.apache.hadoop.hbase.client/TestHCM/testConnectionUniqueness/

          Please tell us the subtest that failed for the above two tests.

          Show
          Ted Yu added a comment - Thanks for running the test suites, Liyin. TRUNK build 2358 passed. Though I did see this in build 2357: https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2357/testReport/junit/org.apache.hadoop.hbase.client/TestHCM/testConnectionUniqueness/ Please tell us the subtest that failed for the above two tests.
          Hide
          Liyin Tang added a comment -

          Thanks Ted
          here is the test results I got.
          So the testConnectionUniqueness in TestHCM has been fixed now ?

          ==============================================
          Results :

          Tests in error:
          testConnectionUniqueness(org.apache.hadoop.hbase.client.TestHCM)
          testOrphanLogCreation(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): Unexpected exception, expected<org.apache.hadoop.hbase.regionserver.wal.OrphanHLogAfterSplitException> but was<java.lang.NullPointerException>
          testOrphanLogCreation(org.apache.hadoop.hbase.master.TestDistributedLogSplitting)
          testRecoveredEdits(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): /data/users/liyintang/hbase-os-trunk/target/test-data/3d058c80-266a-4164-8143-925d514f016e/09d560d3-254e-4986-abe1-22b876d299f1/4758e332-2ae7-4194-bfea-900ee4a2e3ab/dfs/name1/current/fsimage (Too many open files)
          testRecoveredEdits(org.apache.hadoop.hbase.master.TestDistributedLogSplitting)
          testWorkerAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): /data/users/liyintang/hbase-os-trunk/target/test-data/3d058c80-266a-4164-8143-925d514f016e/09d560d3-254e-4986-abe1-22b876d299f1/4758e332-2ae7-4194-bfea-900ee4a2e3ab/3949c75c-8c23-4513-b1cc-e94b1bba640b/dfs/name1/current/fsimage (Too many open files)
          testWorkerAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting)

          Tests run: 1056, Failures: 0, Errors: 7, Skipped: 9

          Show
          Liyin Tang added a comment - Thanks Ted here is the test results I got. So the testConnectionUniqueness in TestHCM has been fixed now ? ============================================== Results : Tests in error: testConnectionUniqueness(org.apache.hadoop.hbase.client.TestHCM) testOrphanLogCreation(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): Unexpected exception, expected<org.apache.hadoop.hbase.regionserver.wal.OrphanHLogAfterSplitException> but was<java.lang.NullPointerException> testOrphanLogCreation(org.apache.hadoop.hbase.master.TestDistributedLogSplitting) testRecoveredEdits(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): /data/users/liyintang/hbase-os-trunk/target/test-data/3d058c80-266a-4164-8143-925d514f016e/09d560d3-254e-4986-abe1-22b876d299f1/4758e332-2ae7-4194-bfea-900ee4a2e3ab/dfs/name1/current/fsimage (Too many open files) testRecoveredEdits(org.apache.hadoop.hbase.master.TestDistributedLogSplitting) testWorkerAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): /data/users/liyintang/hbase-os-trunk/target/test-data/3d058c80-266a-4164-8143-925d514f016e/09d560d3-254e-4986-abe1-22b876d299f1/4758e332-2ae7-4194-bfea-900ee4a2e3ab/3949c75c-8c23-4513-b1cc-e94b1bba640b/dfs/name1/current/fsimage (Too many open files) testWorkerAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting) Tests run: 1056, Failures: 0, Errors: 7, Skipped: 9
          Hide
          Ted Yu added a comment -

          TestHCM wasn't fixed. If the test fails consistently, maybe you can help debug it.
          For the other test failures, it seems ulimit on the machine performing tests has to
          be increased.

          Show
          Ted Yu added a comment - TestHCM wasn't fixed. If the test fails consistently, maybe you can help debug it. For the other test failures, it seems ulimit on the machine performing tests has to be increased.
          Hide
          Nicolas Spiegelberg added a comment -

          +1 on commit. TestHCM is an issue unrelated to this JIRA and shouldn't hold it up. Should use 'git bisect' to figure out where it was introduced and comment on that JIRA.

          Show
          Nicolas Spiegelberg added a comment - +1 on commit. TestHCM is an issue unrelated to this JIRA and shouldn't hold it up. Should use 'git bisect' to figure out where it was introduced and comment on that JIRA.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2363 (See https://builds.apache.org/job/HBase-TRUNK/2363/)
          HBASE-4532 Avoid top row seek by dedicated bloom filter for delete family bloom filter

          nspiegelberg :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2363 (See https://builds.apache.org/job/HBase-TRUNK/2363/ ) HBASE-4532 Avoid top row seek by dedicated bloom filter for delete family bloom filter nspiegelberg : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
          Hide
          Jonathan Hsieh added a comment -

          This seems to be checked into trunk now and there seems to be an extraneous System.out.println that is causing some of my tests to "fail" when run from maven (apparently maven buffers in memory instead of writing it out as a test is executing).

          Here's the OOME that maven reports:

          Exception in thread "ThreadedStreamConsumer" java.lang.OutOfMemoryError: Java heap spaceat java.util.Arrays.copyOf(Arrays.java:2882)at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)at java.lang.StringBuffer.append(StringBuffer.java:224)at org.apache.maven.surefire.report.ConsoleOutputFileReporter.writeMessage(ConsoleOutputFileReporter.java:115)at org.apache.maven.surefire.report.MulticastingReporter.writeMessage(MulticastingReporter.java:101)at org.apache.maven.surefire.report.TestSetRunListener.writeTestOutput(TestSetRunListener.java:99)at org.apache.maven.plugin.surefire.booterclient.output.ForkClient.consumeLine(ForkClient.java:132)at org.apache.maven.plugin.surefire.booterclient.output.ThreadedStreamConsumer$Pumper.run(ThreadedStreamConsumer.java:67)at java.lang.Thread.run(Thread.java:662) man

          I've attached a patch eliminates this issue.

          Show
          Jonathan Hsieh added a comment - This seems to be checked into trunk now and there seems to be an extraneous System.out.println that is causing some of my tests to "fail" when run from maven (apparently maven buffers in memory instead of writing it out as a test is executing). Here's the OOME that maven reports: Exception in thread "ThreadedStreamConsumer" java.lang.OutOfMemoryError: Java heap spaceat java.util.Arrays.copyOf(Arrays.java:2882)at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)at java.lang.StringBuffer.append(StringBuffer.java:224)at org.apache.maven.surefire.report.ConsoleOutputFileReporter.writeMessage(ConsoleOutputFileReporter.java:115)at org.apache.maven.surefire.report.MulticastingReporter.writeMessage(MulticastingReporter.java:101)at org.apache.maven.surefire.report.TestSetRunListener.writeTestOutput(TestSetRunListener.java:99)at org.apache.maven.plugin.surefire.booterclient.output.ForkClient.consumeLine(ForkClient.java:132)at org.apache.maven.plugin.surefire.booterclient.output.ThreadedStreamConsumer$Pumper.run(ThreadedStreamConsumer.java:67)at java.lang.Thread.run(Thread.java:662) man I've attached a patch eliminates this issue.
          Hide
          Jonathan Hsieh added a comment -

          Attached file has the system out line removed.

          Show
          Jonathan Hsieh added a comment - Attached file has the system out line removed.
          Hide
          Liyin Tang added a comment -

          Thanks Jonathan for the patch. I should remove this line out.

          Show
          Liyin Tang added a comment - Thanks Jonathan for the patch. I should remove this line out.
          Hide
          Ted Yu added a comment -

          Integrated addendum to TRUNK.

          Thanks Jonathan.

          Show
          Ted Yu added a comment - Integrated addendum to TRUNK. Thanks Jonathan.
          Hide
          Jonathan Gray added a comment -

          Please stop doing multiple commits on the same JIRA! I thought we agreed on this, or no?

          Show
          Jonathan Gray added a comment - Please stop doing multiple commits on the same JIRA! I thought we agreed on this, or no?
          Hide
          Ted Yu added a comment -

          This JIRA wasn't closed before applying the addendum.
          May I ask why this JIRA was integrated on the 24th without announcement ?

          Show
          Ted Yu added a comment - This JIRA wasn't closed before applying the addendum. May I ask why this JIRA was integrated on the 24th without announcement ?
          Hide
          Jonathan Gray added a comment -

          I don't think JIRA being open/closed is the issue, it's more multiple commits.

          But yeah, as a separate note, looks like there was no final comment and resolution after the commit.

          Show
          Jonathan Gray added a comment - I don't think JIRA being open/closed is the issue, it's more multiple commits. But yeah, as a separate note, looks like there was no final comment and resolution after the commit.
          Hide
          Ted Yu added a comment -

          I would interpret the JIRA not being resolved as anticipation for an addendum

          Show
          Ted Yu added a comment - I would interpret the JIRA not being resolved as anticipation for an addendum
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2380 (See https://builds.apache.org/job/HBase-TRUNK/2380/)
          HBASE-4532 remove system.out.println (Jonathan Hsieh)

          tedyu :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2380 (See https://builds.apache.org/job/HBase-TRUNK/2380/ ) HBASE-4532 remove system.out.println (Jonathan Hsieh) tedyu : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
          Hide
          Liyin Tang added a comment -

          Thank Ted, Jonathan Gray for committing this.
          I will double check the submitted patch to avoid this problem.

          Nice Catch Jonathan Hsieh. Thank you for the patch

          Show
          Liyin Tang added a comment - Thank Ted, Jonathan Gray for committing this. I will double check the submitted patch to avoid this problem. Nice Catch Jonathan Hsieh. Thank you for the patch
          Hide
          Ted Yu added a comment -

          Looks like CHANGES.txt wasn't updated to include this JIRA.

          Show
          Ted Yu added a comment - Looks like CHANGES.txt wasn't updated to include this JIRA.
          Hide
          Liyin Tang added a comment -

          Shall we add an incompatible flag for this jira?
          Because adding a new block type is not backward compatible.

          Show
          Liyin Tang added a comment - Shall we add an incompatible flag for this jira? Because adding a new block type is not backward compatible.
          Hide
          Ted Yu added a comment -

          @Liyin:
          Can you update Release Notes ?
          Thanks

          Show
          Ted Yu added a comment - @Liyin: Can you update Release Notes ? Thanks
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2397 (See https://builds.apache.org/job/HBase-TRUNK/2397/)
          Fixed CHANGES file for HBASE-4532 & HBASE-4611

          nspiegelberg :
          Files :

          • /hbase/trunk/CHANGES.txt
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2397 (See https://builds.apache.org/job/HBase-TRUNK/2397/ ) Fixed CHANGES file for HBASE-4532 & HBASE-4611 nspiegelberg : Files : /hbase/trunk/CHANGES.txt
          Hide
          Phabricator added a comment -

          Liyin has abandoned the revision "[jira] HBASE-4532 Avoid top row seek by dedicated bloom filter for delete family bloom filter".

          abandon the stale revision.

          REVISION DETAIL
          https://reviews.facebook.net/D27

          Show
          Phabricator added a comment - Liyin has abandoned the revision " [jira] HBASE-4532 Avoid top row seek by dedicated bloom filter for delete family bloom filter". abandon the stale revision. REVISION DETAIL https://reviews.facebook.net/D27
          Hide
          Phabricator added a comment -

          Liyin has abandoned the revision "[jira] HBASE-4532 Avoid top row seek by dedicated bloom filter for delete family bloom filter".

          abandon the stale revision.

          REVISION DETAIL
          https://reviews.facebook.net/D27

          Show
          Phabricator added a comment - Liyin has abandoned the revision " [jira] HBASE-4532 Avoid top row seek by dedicated bloom filter for delete family bloom filter". abandon the stale revision. REVISION DETAIL https://reviews.facebook.net/D27
          Hide
          Phabricator added a comment -

          Liyin has abandoned the revision "[jira] HBASE-4532 Avoid top row seek by dedicated bloom filter for delete family bloom filter".

          abandon the stale revision.

          REVISION DETAIL
          https://reviews.facebook.net/D27

          Show
          Phabricator added a comment - Liyin has abandoned the revision " [jira] HBASE-4532 Avoid top row seek by dedicated bloom filter for delete family bloom filter". abandon the stale revision. REVISION DETAIL https://reviews.facebook.net/D27
          Hide
          stack added a comment -

          This feature looks like its always on (which would make sense). Can you confirm Liyin? Thanks.

          Show
          stack added a comment - This feature looks like its always on (which would make sense). Can you confirm Liyin? Thanks.

            People

            • Assignee:
              Liyin Tang
              Reporter:
              Liyin Tang
            • Votes:
              2 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development