Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21879

Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0-alpha-1, 2.3.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Before this issue, read path was 100% offheap when block is in the BucketCache. But if a cache miss, then the RS needs to read the block via an on-heap API which causes high young-GC pressure.

      This issue adds reading the block via offheap even if reading the block from filesystem directly. It requires hadoop version(>=2.9.3) but can also work with older hadoop versions (all works but we continue to read block onheap). It also requires HBASE-21946 which is not yet in place as of this writing/hbase-2.3.0.

      We have written a careful doc about the implementation, performance and practice here: https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI_E/edit#heading=h.nch5d72p27ex
      Show
      Before this issue, read path was 100% offheap when block is in the BucketCache. But if a cache miss, then the RS needs to read the block via an on-heap API which causes high young-GC pressure. This issue adds reading the block via offheap even if reading the block from filesystem directly. It requires hadoop version(>=2.9.3) but can also work with older hadoop versions (all works but we continue to read block onheap). It also requires HBASE-21946 which is not yet in place as of this writing/hbase-2.3.0. We have written a careful doc about the implementation, performance and practice here: https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI_E/edit#heading=h.nch5d72p27ex

      Description

      In HFileBlock#readBlockDataInternal, we have the following:

      @VisibleForTesting
      protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
          long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, boolean updateMetrics)
       throws IOException {
       // .....
        // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with BBPool (offheap).
        byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
        int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
            onDiskSizeWithHeader - preReadHeaderSize, true, offset + preReadHeaderSize, pread);
        if (headerBuf != null) {
              // ...
        }
        // ...
       }
      

      In the read path, we still read the block from hfile to on-heap byte[], then copy the on-heap byte[] to offheap bucket cache asynchronously, and in my 100% get performance test, I also observed some frequent young gc, The largest memory footprint in the young gen should be the on-heap block byte[].

      In fact, we can read HFile's block to ByteBuffer directly instead of to byte[] for reducing young gc purpose. we did not implement this before, because no ByteBuffer reading interface in the older HDFS client, but 2.7+ has supported this now, so we can fix this now. I think.

      Will provide an patch and some perf-comparison for this.

        Attachments

        1. gc-data-before-HBASE-21879.png
          385 kB
          Zheng Hu
        2. HBASE-21879.v1.patch
          76 kB
          Zheng Hu
        3. HBASE-21879.v1.patch
          76 kB
          Zheng Hu
        4. QPS-latencies-before-HBASE-21879.png
          279 kB
          Zheng Hu

        Issue Links

        1.
        Abstract an ByteBuffAllocator to allocate/free ByteBuffer in ByteBufferPool Sub-task Resolved Zheng Hu Actions
        2.
        Make the HFileBlock#validateChecksum can accept ByteBuff as an input. Sub-task Resolved Zheng Hu Actions
        3.
        Notify users if the ByteBufAllocator is always allocating ByteBuffers from heap which means the increacing GC pressure Sub-task Resolved Zheng Hu Actions
        4.
        Make the Compression#decompress can accept ByteBuff as input Sub-task Resolved Zheng Hu Actions
        5.
        Abstract an ByteBuffOutputStream for building cell block Sub-task Resolved Zheng Hu Actions
        6.
        Consider simplifying the logic of BucketCache eviction. Sub-task Resolved Unassigned Actions
        7.
        Unify refCount of BucketEntry and refCount of hbase.nio.ByteBuff into one Sub-task Resolved Zheng Hu Actions
        8.
        Use ByteBuff's refcnt to track the life cycle of data block Sub-task Resolved Zheng Hu Actions
        9.
        Rewrite the block reading methods by using hbase.nio.ByteBuff Sub-task Resolved Zheng Hu Actions
        10.
        The HFileBlock#CacheableDeserializer should pass ByteBuffAllocator to the newly created HFileBlock Sub-task Resolved Zheng Hu Actions
        11.
        Ensure that the block cached in the LRUBlockCache offheap is allocated from heap Sub-task Resolved Zheng Hu Actions
        12.
        Change to release mob hfile's block after rpc server shipped response to client Sub-task Resolved Zheng Hu Actions
        13.
        ByteBufferIOEngine should support write off-heap ByteBuff to the bufferArray Sub-task Resolved Zheng Hu Actions
        14.
        Remove the returnBlock method because we can just call HFileBlock#release directly Sub-task Resolved Zheng Hu Actions
        15.
        Evaluate the get/scan performance after reading HFile block into offheap directly Sub-task Resolved Zheng Hu Actions
        16.
        Improve the metrics in ByteBuffAllocator Sub-task Resolved Zheng Hu Actions
        17.
        Retain an ByteBuff with refCnt=0 when getBlock from LRUCache Sub-task Resolved Zheng Hu Actions
        18.
        Add a UT to address the HFileBlock#heapSize() in TestHeapSize Sub-task Resolved Zheng Hu Actions
        19.
        Some paths in HFileScannerImpl did not consider block#release which will exhaust the ByteBuffAllocator Sub-task Resolved Zheng Hu Actions
        20.
        It's better to use 65KB as the default buffer size in ByteBuffAllocator Sub-task Resolved Zheng Hu Actions
        21.
        Separate the heap HFileBlock and offheap HFileBlock because the heap block won't need refCnt and save into prevBlocks list before shipping Sub-task Resolved Zheng Hu Actions
        22.
        Optimize the MultiByteBuff#get(ByteBuffer, offset, len) Sub-task Resolved Zheng Hu Actions
        23.
        The HFileReaderImpl#shouldUseHeap return the incorrect true when disabled BlockCache Sub-task Resolved Zheng Hu Actions
        24.
        There's still too much cpu wasting on validating checksum even if buffer.size=65KB Sub-task Resolved Zheng Hu Actions
        25.
        Align the config keys and add document for offheap read in HBase Book. Sub-task Resolved Zheng Hu Actions
        26.
        Deprecated the hbase.ipc.server.reservoir.initial.buffer.size & hbase.ipc.server.reservoir.initial.max for HBase2.x compatibility Sub-task Resolved Zheng Hu Actions
        27.
        Address the final overview reviewing comments of HBASE-21879 Sub-task Resolved Zheng Hu Actions
        28.
        Backport offheap block reading (HBASE-21879) to branch-2 Sub-task Resolved Zheng Hu Actions
        29.
        The HeapAllocationRatio in WebUI is not accurate because all of the heap allocation will happen in another separated allocator named HEAP Sub-task Resolved Zheng Hu Actions

          Activity

            People

            • Assignee:
              openinx Zheng Hu
              Reporter:
              openinx Zheng Hu

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment