Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21879

Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha-1, 2.3.0
    • None
    • None
    • Reviewed
    • Hide
      Before this issue, read path was 100% offheap when block is in the BucketCache. But if a cache miss, then the RS needs to read the block via an on-heap API which causes high young-GC pressure.

      This issue adds reading the block via offheap even if reading the block from filesystem directly. It requires hadoop version(>=2.9.3) but can also work with older hadoop versions (all works but we continue to read block onheap). It also requires HBASE-21946 which is not yet in place as of this writing/hbase-2.3.0.

      We have written a careful doc about the implementation, performance and practice here: https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI_E/edit#heading=h.nch5d72p27ex
      Show
      Before this issue, read path was 100% offheap when block is in the BucketCache. But if a cache miss, then the RS needs to read the block via an on-heap API which causes high young-GC pressure. This issue adds reading the block via offheap even if reading the block from filesystem directly. It requires hadoop version(>=2.9.3) but can also work with older hadoop versions (all works but we continue to read block onheap). It also requires HBASE-21946 which is not yet in place as of this writing/hbase-2.3.0. We have written a careful doc about the implementation, performance and practice here: https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI_E/edit#heading=h.nch5d72p27ex

    Description

      In HFileBlock#readBlockDataInternal, we have the following:

      @VisibleForTesting
      protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
          long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, boolean updateMetrics)
       throws IOException {
       // .....
        // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with BBPool (offheap).
        byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
        int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
            onDiskSizeWithHeader - preReadHeaderSize, true, offset + preReadHeaderSize, pread);
        if (headerBuf != null) {
              // ...
        }
        // ...
       }
      

      In the read path, we still read the block from hfile to on-heap byte[], then copy the on-heap byte[] to offheap bucket cache asynchronously, and in my 100% get performance test, I also observed some frequent young gc, The largest memory footprint in the young gen should be the on-heap block byte[].

      In fact, we can read HFile's block to ByteBuffer directly instead of to byte[] for reducing young gc purpose. we did not implement this before, because no ByteBuffer reading interface in the older HDFS client, but 2.7+ has supported this now, so we can fix this now. I think.

      Will provide an patch and some perf-comparison for this.

      Attachments

        1. QPS-latencies-before-HBASE-21879.png
          279 kB
          Zheng Hu
        2. HBASE-21879.v1.patch
          76 kB
          Zheng Hu
        3. HBASE-21879.v1.patch
          76 kB
          Zheng Hu
        4. gc-data-before-HBASE-21879.png
          385 kB
          Zheng Hu

        Issue Links

          1.
          Abstract an ByteBuffAllocator to allocate/free ByteBuffer in ByteBufferPool Sub-task Closed Zheng Hu
          2.
          Make the HFileBlock#validateChecksum can accept ByteBuff as an input. Sub-task Closed Zheng Hu
          3.
          Notify users if the ByteBufAllocator is always allocating ByteBuffers from heap which means the increacing GC pressure Sub-task Closed Zheng Hu
          4.
          Make the Compression#decompress can accept ByteBuff as input Sub-task Closed Zheng Hu
          5.
          Abstract an ByteBuffOutputStream for building cell block Sub-task Closed Unassigned
          6.
          Consider simplifying the logic of BucketCache eviction. Sub-task Closed Unassigned
          7.
          Unify refCount of BucketEntry and refCount of hbase.nio.ByteBuff into one Sub-task Closed Zheng Hu
          8.
          Use ByteBuff's refcnt to track the life cycle of data block Sub-task Closed Zheng Hu
          9.
          Rewrite the block reading methods by using hbase.nio.ByteBuff Sub-task Closed Zheng Hu
          10.
          The HFileBlock#CacheableDeserializer should pass ByteBuffAllocator to the newly created HFileBlock Sub-task Closed Zheng Hu
          11.
          Ensure that the block cached in the LRUBlockCache offheap is allocated from heap Sub-task Closed Zheng Hu
          12.
          Change to release mob hfile's block after rpc server shipped response to client Sub-task Closed Zheng Hu
          13.
          ByteBufferIOEngine should support write off-heap ByteBuff to the bufferArray Sub-task Closed Zheng Hu
          14.
          Remove the returnBlock method because we can just call HFileBlock#release directly Sub-task Closed Zheng Hu
          15.
          Evaluate the get/scan performance after reading HFile block into offheap directly Sub-task Closed Zheng Hu
          16.
          Improve the metrics in ByteBuffAllocator Sub-task Closed Zheng Hu
          17.
          Retain an ByteBuff with refCnt=0 when getBlock from LRUCache Sub-task Closed Zheng Hu
          18.
          Add a UT to address the HFileBlock#heapSize() in TestHeapSize Sub-task Closed Zheng Hu
          19.
          Some paths in HFileScannerImpl did not consider block#release which will exhaust the ByteBuffAllocator Sub-task Closed Zheng Hu
          20.
          It's better to use 65KB as the default buffer size in ByteBuffAllocator Sub-task Closed Zheng Hu
          21.
          Separate the heap HFileBlock and offheap HFileBlock because the heap block won't need refCnt and save into prevBlocks list before shipping Sub-task Closed Zheng Hu
          22.
          Optimize the MultiByteBuff#get(ByteBuffer, offset, len) Sub-task Closed Zheng Hu
          23.
          The HFileReaderImpl#shouldUseHeap return the incorrect true when disabled BlockCache Sub-task Closed Zheng Hu
          24.
          There's still too much cpu wasting on validating checksum even if buffer.size=65KB Sub-task Closed Unassigned
          25.
          Align the config keys and add document for offheap read in HBase Book. Sub-task Closed Zheng Hu
          26.
          Deprecated the hbase.ipc.server.reservoir.initial.buffer.size & hbase.ipc.server.reservoir.initial.max for HBase2.x compatibility Sub-task Closed Zheng Hu
          27.
          Address the final overview reviewing comments of HBASE-21879 Sub-task Closed Zheng Hu
          28.
          Backport offheap block reading (HBASE-21879) to branch-2 Sub-task Closed Zheng Hu
          29.
          The HeapAllocationRatio in WebUI is not accurate because all of the heap allocation will happen in another separated allocator named HEAP Sub-task Closed Zheng Hu

          Activity

            People

              openinx Zheng Hu
              openinx Zheng Hu
              Votes:
              0 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: