1. HBase
  2. HBASE-4496

HFile V2 does not honor setCacheBlocks when scanning.


    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.92.0, 0.94.0
    • Fix Version/s: 0.92.0, 0.94.0
    • Component/s: regionserver
    • Labels:
    • Hadoop Flags:


      While testing the LRU cache during the scanning I noticed quite some churn in the cache even when Scan.cacheBlocks is set to false. After debugging this, I found that HFile V2 always caches blocks in the LRU cache regardless of the cacheBlocks setting.

      Here's a trace (from Eclipse) showing the problem:

      HFileReaderV2.readBlock(long, int, boolean, boolean, boolean) line: 279
      HFileReaderV2.readBlockData(long, long, int, boolean) line: 219
      HFileBlockIndex$BlockIndexReader.seekToDataBlock(byte[], int, int, HFileBlock) line: 191
      HFileReaderV2$ScannerV2.seekTo(byte[], int, int, boolean) line: 502
      HFileReaderV2$ScannerV2.reseekTo(byte[], int, int) line: 539
      StoreFileScanner.reseekAtOrAfter(HFileScanner, KeyValue) line: 151
      StoreFileScanner.reseek(KeyValue) line: 110
      KeyValueHeap.reseek(KeyValue) line: 255
      StoreScanner.reseek(KeyValue) line: 409
      StoreScanner.next(List<KeyValue>, int) line: 304
      KeyValueHeap.next(List<KeyValue>, int) line: 114
      KeyValueHeap.next(List<KeyValue>) line: 143
      HRegion$RegionScannerImpl.nextRow(byte[]) line: 2774
      HRegion$RegionScannerImpl.nextInternal(int) line: 2722
      HRegion$RegionScannerImpl.next(List<KeyValue>, int) line: 2682
      HRegion$RegionScannerImpl.next(List<KeyValue>) line: 2699
      HRegionServer.next(long, int) line: 2092

      Every scanner.next causes a reseek, which eventually causes a call to HFileBlockIndex$BlockIndexReader.seekToDataBlock(...) at which point the cacheBlocks information is lost. HFileReaderV2.readBlockData calls HFileReaderV2.readBlock with cacheBlocks set unconditionally to true.

      The fix is not immediately clear, unless we want to pass cacheBlocks to HFileBlockIndex$BlockIndexReader.seekToDataBlock and then on to HFileBlock.BasicReader.readBlockData and all its implementers, which is ugly as readBlockData should not care about caching.

      Avoiding caching during scans is somewhat important for us.

      1. 4496.txt
        14 kB
        Lars Hofhansl
      2. 4496.final
        19 kB
        Ted Yu
      3. 4496.addendum
        3 kB
        Ted Yu

        Issue Links


          No work has yet been logged on this issue.


            • Assignee:
              Mikhail Bautin
              Lars Hofhansl
            • Votes:
              2 Vote for this issue
              5 Start watching this issue


              • Created: