HBase
  1. HBase
  2. HBASE-4496

HFile V2 does not honor setCacheBlocks when scanning.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.92.0, 0.94.0
    • Fix Version/s: 0.92.0, 0.94.0
    • Component/s: regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      While testing the LRU cache during the scanning I noticed quite some churn in the cache even when Scan.cacheBlocks is set to false. After debugging this, I found that HFile V2 always caches blocks in the LRU cache regardless of the cacheBlocks setting.

      Here's a trace (from Eclipse) showing the problem:

      HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279
      HFileReaderV2.readBlockData(long, long, int, boolean) line: 219
      HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, HFileBlock) line: 191
      HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502
      HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539
      StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151
      StoreFileScanner.reseek(KeyValue) line: 110
      KeyValueHeap.reseek(KeyValue) line: 255
      StoreScanner.reseek(KeyValue) line: 409
      StoreScanner.next(List<KeyValue>, int) line: 304
      KeyValueHeap.next(List<KeyValue>, int) line: 114
      KeyValueHeap.next(List<KeyValue>) line: 143
      HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774
      HRegion$RegionScannerImpl.nextInternal(int) line: 2722
      HRegion$RegionScannerImpl.next(List<KeyValue>, int) line: 2682
      HRegion$RegionScannerImpl.next(List<KeyValue>) line: 2699
      HRegionServer.next(long, int) line: 2092

      Every scanner.next causes a reseek, which eventually causes a call to HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the cacheBlocks information is lost. HFileReaderV2.readBlockData calls HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.

      The fix is not immediately clear, unless we want to pass cacheBlocks to HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly as readBlockData should not care about caching.

      Avoiding caching during scans is somewhat important for us.

      1. 4496.addendum
        3 kB
        Ted Yu
      2. 4496.final
        19 kB
        Ted Yu
      3. 4496.txt
        14 kB
        Lars Hofhansl

        Issue Links

          Activity

          Hide
          Lars Hofhansl added a comment -

          I'll come up with a patch.

          Show
          Lars Hofhansl added a comment - I'll come up with a patch.
          Hide
          Lars Hofhansl added a comment -

          In a simple test this shaves about 20% of scanning time (with caching switched off), in addition to avoiding LRU and GC churn.

          Show
          Lars Hofhansl added a comment - In a simple test this shaves about 20% of scanning time (with caching switched off), in addition to avoiding LRU and GC churn.
          Hide
          stack added a comment -

          Good find Lars.

          Show
          stack added a comment - Good find Lars.
          Hide
          Lars Hofhansl added a comment -

          This fixes the problem for me.

          TestHFileBlock, TestCacheOnWrite, TestHFileBlockIndex, and TestHFileWriterV2 still pass.

          It's not pretty, though. Lots of places where cacheBlock now needs to be passed around.
          I erred on the side of caution, where is wasn't clear whether the block should be cached of not, I kept it being cached (as before).

          Somebody with more knowledge about HFileReaderV2 should have a look. If there's something nicer to do, please let me know.
          It is not clear to be how to write a test for this.

          Show
          Lars Hofhansl added a comment - This fixes the problem for me. TestHFileBlock, TestCacheOnWrite, TestHFileBlockIndex, and TestHFileWriterV2 still pass. It's not pretty, though. Lots of places where cacheBlock now needs to be passed around. I erred on the side of caution, where is wasn't clear whether the block should be cached of not, I kept it being cached (as before). Somebody with more knowledge about HFileReaderV2 should have a look. If there's something nicer to do, please let me know. It is not clear to be how to write a test for this.
          Hide
          Jonathan Gray added a comment -

          I can roll this in to my changes for CacheConfig. Don't have the JIRA off hand (but the point of it is to make it so we dont' have to change a bunch of constructors all the time).

          I'm hoping to have a diff out tonight or tomorrow morning.

          Show
          Jonathan Gray added a comment - I can roll this in to my changes for CacheConfig. Don't have the JIRA off hand (but the point of it is to make it so we dont' have to change a bunch of constructors all the time). I'm hoping to have a diff out tonight or tomorrow morning.
          Hide
          Lars Hofhansl added a comment -

          That sounds like a plan Jon. Let me know if you'd like me to have a look at the combined patch.

          Show
          Lars Hofhansl added a comment - That sounds like a plan Jon. Let me know if you'd like me to have a look at the combined patch.
          Hide
          Jonathan Gray added a comment -

          So this exact issue actually triggered why I was having a hard time getting TestCacheOnWrite to pass. The test was previously relying on some broken/inconsistent behavior in which it passes a single instance of a reader with a null block cache but that was removed with the latest CacheConfig stuff.

          My latest patch for HBASE-4422 actually just changes the always true to always false I'm going to talk to Mikhail tomorrow (Friday) about the issue here and see if he has any thoughts.

          Show
          Jonathan Gray added a comment - So this exact issue actually triggered why I was having a hard time getting TestCacheOnWrite to pass. The test was previously relying on some broken/inconsistent behavior in which it passes a single instance of a reader with a null block cache but that was removed with the latest CacheConfig stuff. My latest patch for HBASE-4422 actually just changes the always true to always false I'm going to talk to Mikhail tomorrow (Friday) about the issue here and see if he has any thoughts.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2136/
          -----------------------------------------------------------

          Review request for hbase, Jonathan Gray and Lars Hofhansl.

          Summary
          -------

          This fixes a couple of long-existing code issues in HFile v2:

          • Making seekBefore cache the previous block it has to read when the scanner happens to be at the first key of a block (this was a performance regression introduced in HFile v2).
          • Fixing the accounting of the number of blocks read for the one-level index case in HFileBlockIndex.seekToDataBlock if the current block is the same as the requested block.
          • Getting rid of HFileBlock.BasicReader, which was used both by FSReaderV2 and HFileReaderV2, but the former did not cache blocks (a source of confusion).
          • Adding a new interface HFile.CachingBlockReader instead, which is implemented by HFile readers and passed to HFileBlockIndex.

          This addresses bug HBASE-4496.
          https://issues.apache.org/jira/browse/HBASE-4496

          Diffs


          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java 4dc1367
          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java 5e98375
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java b429819
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java 953896e
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 13d5e70
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 1cf7767
          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java eec566e

          Diff: https://reviews.apache.org/r/2136/diff

          Testing
          -------

          This is in production in Facebook's hbase-89 branch.

          Still testing this open-source patch – please don't commit yet.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2136/ ----------------------------------------------------------- Review request for hbase, Jonathan Gray and Lars Hofhansl. Summary ------- This fixes a couple of long-existing code issues in HFile v2: Making seekBefore cache the previous block it has to read when the scanner happens to be at the first key of a block (this was a performance regression introduced in HFile v2). Fixing the accounting of the number of blocks read for the one-level index case in HFileBlockIndex.seekToDataBlock if the current block is the same as the requested block. Getting rid of HFileBlock.BasicReader, which was used both by FSReaderV2 and HFileReaderV2, but the former did not cache blocks (a source of confusion). Adding a new interface HFile.CachingBlockReader instead, which is implemented by HFile readers and passed to HFileBlockIndex. This addresses bug HBASE-4496 . https://issues.apache.org/jira/browse/HBASE-4496 Diffs src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java 4dc1367 src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java 5e98375 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java b429819 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java 953896e src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 13d5e70 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 1cf7767 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java eec566e Diff: https://reviews.apache.org/r/2136/diff Testing ------- This is in production in Facebook's hbase-89 branch. Still testing this open-source patch – please don't commit yet. Thanks, Mikhail
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2136/#review2232
          -----------------------------------------------------------

          Ship it!

          I love it. This is what I should have done with HBASE-4496 if I had had more knowledge about the reader code.
          I'll do some more manual testing with your patch applied.
          This will create extra merging work for HBASE-4422 and HBASE-4344

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
          <https://reviews.apache.org/r/2136/#comment5184>

          This is good.

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          <https://reviews.apache.org/r/2136/#comment5182>

          I like this. HFileReaderV2 implementing HFileBlock.BasicReader was strange.

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          <https://reviews.apache.org/r/2136/#comment5183>

          No more casting, awesome.

          Very minor nit, but why not do reader.getDataBlockIndexReader().seekToDataBlock(...) as you do below?

          • Lars

          On 2011-09-30 20:41:01, Mikhail Bautin wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2136/

          -----------------------------------------------------------

          (Updated 2011-09-30 20:41:01)

          Review request for hbase, Jonathan Gray and Lars Hofhansl.

          Summary

          -------

          This fixes a couple of long-existing code issues in HFile v2:

          - Making seekBefore cache the previous block it has to read when the scanner happens to be at the first key of a block (this was a performance regression introduced in HFile v2).

          - Fixing the accounting of the number of blocks read for the one-level index case in HFileBlockIndex.seekToDataBlock if the current block is the same as the requested block.

          - Getting rid of HFileBlock.BasicReader, which was used both by FSReaderV2 and HFileReaderV2, but the former did not cache blocks (a source of confusion).

          - Adding a new interface HFile.CachingBlockReader instead, which is implemented by HFile readers and passed to HFileBlockIndex.

          This addresses bug HBASE-4496.

          https://issues.apache.org/jira/browse/HBASE-4496

          Diffs

          -----

          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java 4dc1367

          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java 5e98375

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java b429819

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java 953896e

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 13d5e70

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 1cf7767

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java eec566e

          Diff: https://reviews.apache.org/r/2136/diff

          Testing

          -------

          This is in production in Facebook's hbase-89 branch.

          Still testing this open-source patch – please don't commit yet.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2136/#review2232 ----------------------------------------------------------- Ship it! I love it. This is what I should have done with HBASE-4496 if I had had more knowledge about the reader code. I'll do some more manual testing with your patch applied. This will create extra merging work for HBASE-4422 and HBASE-4344 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java < https://reviews.apache.org/r/2136/#comment5184 > This is good. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java < https://reviews.apache.org/r/2136/#comment5182 > I like this. HFileReaderV2 implementing HFileBlock.BasicReader was strange. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java < https://reviews.apache.org/r/2136/#comment5183 > No more casting, awesome. Very minor nit, but why not do reader.getDataBlockIndexReader().seekToDataBlock(...) as you do below? Lars On 2011-09-30 20:41:01, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2136/ ----------------------------------------------------------- (Updated 2011-09-30 20:41:01) Review request for hbase, Jonathan Gray and Lars Hofhansl. Summary ------- This fixes a couple of long-existing code issues in HFile v2: - Making seekBefore cache the previous block it has to read when the scanner happens to be at the first key of a block (this was a performance regression introduced in HFile v2). - Fixing the accounting of the number of blocks read for the one-level index case in HFileBlockIndex.seekToDataBlock if the current block is the same as the requested block. - Getting rid of HFileBlock.BasicReader, which was used both by FSReaderV2 and HFileReaderV2, but the former did not cache blocks (a source of confusion). - Adding a new interface HFile.CachingBlockReader instead, which is implemented by HFile readers and passed to HFileBlockIndex. This addresses bug HBASE-4496 . https://issues.apache.org/jira/browse/HBASE-4496 Diffs ----- src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java 4dc1367 src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java 5e98375 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java b429819 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java 953896e src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 13d5e70 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 1cf7767 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java eec566e Diff: https://reviews.apache.org/r/2136/diff Testing ------- This is in production in Facebook's hbase-89 branch. Still testing this open-source patch – please don't commit yet. Thanks, Mikhail
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2136/#review2234
          -----------------------------------------------------------

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          <https://reviews.apache.org/r/2136/#comment5186>

          This is mainly to keep lines <80 characters and for readability.

          • Mikhail

          On 2011-09-30 20:41:01, Mikhail Bautin wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2136/

          -----------------------------------------------------------

          (Updated 2011-09-30 20:41:01)

          Review request for hbase, Jonathan Gray and Lars Hofhansl.

          Summary

          -------

          This fixes a couple of long-existing code issues in HFile v2:

          - Making seekBefore cache the previous block it has to read when the scanner happens to be at the first key of a block (this was a performance regression introduced in HFile v2).

          - Fixing the accounting of the number of blocks read for the one-level index case in HFileBlockIndex.seekToDataBlock if the current block is the same as the requested block.

          - Getting rid of HFileBlock.BasicReader, which was used both by FSReaderV2 and HFileReaderV2, but the former did not cache blocks (a source of confusion).

          - Adding a new interface HFile.CachingBlockReader instead, which is implemented by HFile readers and passed to HFileBlockIndex.

          This addresses bug HBASE-4496.

          https://issues.apache.org/jira/browse/HBASE-4496

          Diffs

          -----

          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java 4dc1367

          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java 5e98375

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java b429819

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java 953896e

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 13d5e70

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 1cf7767

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java eec566e

          Diff: https://reviews.apache.org/r/2136/diff

          Testing

          -------

          This is in production in Facebook's hbase-89 branch.

          Still testing this open-source patch – please don't commit yet.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2136/#review2234 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java < https://reviews.apache.org/r/2136/#comment5186 > This is mainly to keep lines <80 characters and for readability. Mikhail On 2011-09-30 20:41:01, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2136/ ----------------------------------------------------------- (Updated 2011-09-30 20:41:01) Review request for hbase, Jonathan Gray and Lars Hofhansl. Summary ------- This fixes a couple of long-existing code issues in HFile v2: - Making seekBefore cache the previous block it has to read when the scanner happens to be at the first key of a block (this was a performance regression introduced in HFile v2). - Fixing the accounting of the number of blocks read for the one-level index case in HFileBlockIndex.seekToDataBlock if the current block is the same as the requested block. - Getting rid of HFileBlock.BasicReader, which was used both by FSReaderV2 and HFileReaderV2, but the former did not cache blocks (a source of confusion). - Adding a new interface HFile.CachingBlockReader instead, which is implemented by HFile readers and passed to HFileBlockIndex. This addresses bug HBASE-4496 . https://issues.apache.org/jira/browse/HBASE-4496 Diffs ----- src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java 4dc1367 src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java 5e98375 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java b429819 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java 953896e src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 13d5e70 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 1cf7767 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java eec566e Diff: https://reviews.apache.org/r/2136/diff Testing ------- This is in production in Facebook's hbase-89 branch. Still testing this open-source patch – please don't commit yet. Thanks, Mikhail
          Hide
          Lars Hofhansl added a comment -

          Tried Mikhail's patch with my test scenario. The scan's cacheBlock is now correctly honored.

          Show
          Lars Hofhansl added a comment - Tried Mikhail's patch with my test scenario. The scan's cacheBlock is now correctly honored.
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2136/#review2256
          -----------------------------------------------------------

          Ship it!

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          <https://reviews.apache.org/r/2136/#comment5234>

          an is not needed

          • Ted

          On 2011-09-30 20:41:01, Mikhail Bautin wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2136/

          -----------------------------------------------------------

          (Updated 2011-09-30 20:41:01)

          Review request for hbase, Jonathan Gray and Lars Hofhansl.

          Summary

          -------

          This fixes a couple of long-existing code issues in HFile v2:

          - Making seekBefore cache the previous block it has to read when the scanner happens to be at the first key of a block (this was a performance regression introduced in HFile v2).

          - Fixing the accounting of the number of blocks read for the one-level index case in HFileBlockIndex.seekToDataBlock if the current block is the same as the requested block.

          - Getting rid of HFileBlock.BasicReader, which was used both by FSReaderV2 and HFileReaderV2, but the former did not cache blocks (a source of confusion).

          - Adding a new interface HFile.CachingBlockReader instead, which is implemented by HFile readers and passed to HFileBlockIndex.

          This addresses bug HBASE-4496.

          https://issues.apache.org/jira/browse/HBASE-4496

          Diffs

          -----

          src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java 4dc1367

          src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java 5e98375

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java b429819

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java 953896e

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 13d5e70

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 1cf7767

          src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java eec566e

          Diff: https://reviews.apache.org/r/2136/diff

          Testing

          -------

          This is in production in Facebook's hbase-89 branch.

          Still testing this open-source patch – please don't commit yet.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2136/#review2256 ----------------------------------------------------------- Ship it! src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java < https://reviews.apache.org/r/2136/#comment5234 > an is not needed Ted On 2011-09-30 20:41:01, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2136/ ----------------------------------------------------------- (Updated 2011-09-30 20:41:01) Review request for hbase, Jonathan Gray and Lars Hofhansl. Summary ------- This fixes a couple of long-existing code issues in HFile v2: - Making seekBefore cache the previous block it has to read when the scanner happens to be at the first key of a block (this was a performance regression introduced in HFile v2). - Fixing the accounting of the number of blocks read for the one-level index case in HFileBlockIndex.seekToDataBlock if the current block is the same as the requested block. - Getting rid of HFileBlock.BasicReader, which was used both by FSReaderV2 and HFileReaderV2, but the former did not cache blocks (a source of confusion). - Adding a new interface HFile.CachingBlockReader instead, which is implemented by HFile readers and passed to HFileBlockIndex. This addresses bug HBASE-4496 . https://issues.apache.org/jira/browse/HBASE-4496 Diffs ----- src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java 4dc1367 src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java 5e98375 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java b429819 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java 953896e src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 13d5e70 src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 1cf7767 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java eec566e Diff: https://reviews.apache.org/r/2136/diff Testing ------- This is in production in Facebook's hbase-89 branch. Still testing this open-source patch – please don't commit yet. Thanks, Mikhail
          Hide
          Ted Yu added a comment -

          Integrated to 0.92 and TRUNK.

          Thanks for the patch Mikhail.

          Thanks for the review Lars.

          Show
          Ted Yu added a comment - Integrated to 0.92 and TRUNK. Thanks for the patch Mikhail. Thanks for the review Lars.
          Hide
          Lars Hofhansl added a comment -

          Mikhail did the work, he deserves the credit

          Show
          Lars Hofhansl added a comment - Mikhail did the work, he deserves the credit
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2136/
          -----------------------------------------------------------

          (Updated 2011-10-02 21:49:56.235130)

          Review request for hbase, Jonathan Gray and Lars Hofhansl.

          Changes
          -------

          Addressing Ted's comment and fixing TestBlocksRead, since we are now reading fewer blocks in a couple of cases. Re-running test suite just in case.

          Summary
          -------

          This fixes a couple of long-existing code issues in HFile v2:

          • Making seekBefore cache the previous block it has to read when the scanner happens to be at the first key of a block (this was a performance regression introduced in HFile v2).
          • Fixing the accounting of the number of blocks read for the one-level index case in HFileBlockIndex.seekToDataBlock if the current block is the same as the requested block.
          • Getting rid of HFileBlock.BasicReader, which was used both by FSReaderV2 and HFileReaderV2, but the former did not cache blocks (a source of confusion).
          • Adding a new interface HFile.CachingBlockReader instead, which is implemented by HFile readers and passed to HFileBlockIndex.

          This addresses bug HBASE-4496.
          https://issues.apache.org/jira/browse/HBASE-4496

          Diffs (updated)


          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 019fddd

          Diff: https://reviews.apache.org/r/2136/diff

          Testing
          -------

          This is in production in Facebook's hbase-89 branch.

          Still testing this open-source patch – please don't commit yet.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2136/ ----------------------------------------------------------- (Updated 2011-10-02 21:49:56.235130) Review request for hbase, Jonathan Gray and Lars Hofhansl. Changes ------- Addressing Ted's comment and fixing TestBlocksRead, since we are now reading fewer blocks in a couple of cases. Re-running test suite just in case. Summary ------- This fixes a couple of long-existing code issues in HFile v2: Making seekBefore cache the previous block it has to read when the scanner happens to be at the first key of a block (this was a performance regression introduced in HFile v2). Fixing the accounting of the number of blocks read for the one-level index case in HFileBlockIndex.seekToDataBlock if the current block is the same as the requested block. Getting rid of HFileBlock.BasicReader, which was used both by FSReaderV2 and HFileReaderV2, but the former did not cache blocks (a source of confusion). Adding a new interface HFile.CachingBlockReader instead, which is implemented by HFile readers and passed to HFileBlockIndex. This addresses bug HBASE-4496 . https://issues.apache.org/jira/browse/HBASE-4496 Diffs (updated) src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 019fddd Diff: https://reviews.apache.org/r/2136/diff Testing ------- This is in production in Facebook's hbase-89 branch. Still testing this open-source patch – please don't commit yet. Thanks, Mikhail
          Hide
          jiraposter@reviews.apache.org added a comment -

          On 2011-10-02 20:43:29, Ted Yu wrote:

          > src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java, line 924

          > <https://reviews.apache.org/r/2136/diff/1/?file=47147#file47147line924>

          >

          > an is not needed

          Sorry – did not realize that this was already committed. The result was that in the second version of this patch the only changes left were in TestBlocksRead. I will re-run the whole test suite again.

          • Mikhail

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2136/#review2256
          -----------------------------------------------------------

          On 2011-10-02 21:49:56, Mikhail Bautin wrote:

          -----------------------------------------------------------

          This is an automatically generated e-mail. To reply, visit:

          https://reviews.apache.org/r/2136/

          -----------------------------------------------------------

          (Updated 2011-10-02 21:49:56)

          Review request for hbase, Jonathan Gray and Lars Hofhansl.

          Summary

          -------

          This fixes a couple of long-existing code issues in HFile v2:

          - Making seekBefore cache the previous block it has to read when the scanner happens to be at the first key of a block (this was a performance regression introduced in HFile v2).

          - Fixing the accounting of the number of blocks read for the one-level index case in HFileBlockIndex.seekToDataBlock if the current block is the same as the requested block.

          - Getting rid of HFileBlock.BasicReader, which was used both by FSReaderV2 and HFileReaderV2, but the former did not cache blocks (a source of confusion).

          - Adding a new interface HFile.CachingBlockReader instead, which is implemented by HFile readers and passed to HFileBlockIndex.

          This addresses bug HBASE-4496.

          https://issues.apache.org/jira/browse/HBASE-4496

          Diffs

          -----

          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 019fddd

          Diff: https://reviews.apache.org/r/2136/diff

          Testing

          -------

          This is in production in Facebook's hbase-89 branch.

          Still testing this open-source patch – please don't commit yet.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - On 2011-10-02 20:43:29, Ted Yu wrote: > src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java, line 924 > < https://reviews.apache.org/r/2136/diff/1/?file=47147#file47147line924 > > > an is not needed Sorry – did not realize that this was already committed. The result was that in the second version of this patch the only changes left were in TestBlocksRead. I will re-run the whole test suite again. Mikhail ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2136/#review2256 ----------------------------------------------------------- On 2011-10-02 21:49:56, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2136/ ----------------------------------------------------------- (Updated 2011-10-02 21:49:56) Review request for hbase, Jonathan Gray and Lars Hofhansl. Summary ------- This fixes a couple of long-existing code issues in HFile v2: - Making seekBefore cache the previous block it has to read when the scanner happens to be at the first key of a block (this was a performance regression introduced in HFile v2). - Fixing the accounting of the number of blocks read for the one-level index case in HFileBlockIndex.seekToDataBlock if the current block is the same as the requested block. - Getting rid of HFileBlock.BasicReader, which was used both by FSReaderV2 and HFileReaderV2, but the former did not cache blocks (a source of confusion). - Adding a new interface HFile.CachingBlockReader instead, which is implemented by HFile readers and passed to HFileBlockIndex. This addresses bug HBASE-4496 . https://issues.apache.org/jira/browse/HBASE-4496 Diffs ----- src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 019fddd Diff: https://reviews.apache.org/r/2136/diff Testing ------- This is in production in Facebook's hbase-89 branch. Still testing this open-source patch – please don't commit yet. Thanks, Mikhail
          Hide
          jiraposter@reviews.apache.org added a comment -

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          https://reviews.apache.org/r/2136/
          -----------------------------------------------------------

          (Updated 2011-10-02 21:58:43.274175)

          Review request for hbase, Jonathan Gray and Lars Hofhansl.

          Changes
          -------

          Fixing comment to be consistent with the expected number of blocks in code.

          Summary
          -------

          This fixes a couple of long-existing code issues in HFile v2:

          • Making seekBefore cache the previous block it has to read when the scanner happens to be at the first key of a block (this was a performance regression introduced in HFile v2).
          • Fixing the accounting of the number of blocks read for the one-level index case in HFileBlockIndex.seekToDataBlock if the current block is the same as the requested block.
          • Getting rid of HFileBlock.BasicReader, which was used both by FSReaderV2 and HFileReaderV2, but the former did not cache blocks (a source of confusion).
          • Adding a new interface HFile.CachingBlockReader instead, which is implemented by HFile readers and passed to HFileBlockIndex.

          This addresses bug HBASE-4496.
          https://issues.apache.org/jira/browse/HBASE-4496

          Diffs (updated)


          src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 019fddd

          Diff: https://reviews.apache.org/r/2136/diff

          Testing
          -------

          This is in production in Facebook's hbase-89 branch.

          Still testing this open-source patch – please don't commit yet.

          Thanks,

          Mikhail

          Show
          jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2136/ ----------------------------------------------------------- (Updated 2011-10-02 21:58:43.274175) Review request for hbase, Jonathan Gray and Lars Hofhansl. Changes ------- Fixing comment to be consistent with the expected number of blocks in code. Summary ------- This fixes a couple of long-existing code issues in HFile v2: Making seekBefore cache the previous block it has to read when the scanner happens to be at the first key of a block (this was a performance regression introduced in HFile v2). Fixing the accounting of the number of blocks read for the one-level index case in HFileBlockIndex.seekToDataBlock if the current block is the same as the requested block. Getting rid of HFileBlock.BasicReader, which was used both by FSReaderV2 and HFileReaderV2, but the former did not cache blocks (a source of confusion). Adding a new interface HFile.CachingBlockReader instead, which is implemented by HFile readers and passed to HFileBlockIndex. This addresses bug HBASE-4496 . https://issues.apache.org/jira/browse/HBASE-4496 Diffs (updated) src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 019fddd Diff: https://reviews.apache.org/r/2136/diff Testing ------- This is in production in Facebook's hbase-89 branch. Still testing this open-source patch – please don't commit yet. Thanks, Mikhail
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92 #37 (See https://builds.apache.org/job/HBase-0.92/37/)
          HBASE-4496 HFile V2 does not honor setCacheBlocks when scanning (Lars and Mikhail)

          tedyu :
          Files :

          • /hbase/branches/0.92/CHANGES.txt
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
          • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          Show
          Hudson added a comment - Integrated in HBase-0.92 #37 (See https://builds.apache.org/job/HBase-0.92/37/ ) HBASE-4496 HFile V2 does not honor setCacheBlocks when scanning (Lars and Mikhail) tedyu : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
          Hide
          Ted Yu added a comment -

          Fixes for TestBlocksRead.java

          Show
          Ted Yu added a comment - Fixes for TestBlocksRead.java
          Hide
          Ted Yu added a comment -

          Addendum integrated to 0.92 and TRUNK.

          Show
          Ted Yu added a comment - Addendum integrated to 0.92 and TRUNK.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.92 #38 (See https://builds.apache.org/job/HBase-0.92/38/)
          HBASE-4496 Addendum for TestBlocksRead

          tedyu :
          Files :

          • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
          Show
          Hudson added a comment - Integrated in HBase-0.92 #38 (See https://builds.apache.org/job/HBase-0.92/38/ ) HBASE-4496 Addendum for TestBlocksRead tedyu : Files : /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java

            People

            • Assignee:
              Mikhail Bautin
              Reporter:
              Lars Hofhansl
            • Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development