Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21879

Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0, 2.3.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Before this issue, we've made the read path 100% offheap when block hit the BucketCache 100%, but if the cache missed then RS need to read the block by on-heap API, which would cause high young GC pressure.
      This issue will read the block by offheap even if reading the block from filesystem directly, it have some requirement for hadoop version(>=2.9.3) but can also works with older hadoop version(means still works fine but will read block onheap). We have written a careful doc about the implementation, performance and practice here: https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI_E/edit#heading=h.nch5d72p27ex, for more details please read it.
      Show
      Before this issue, we've made the read path 100% offheap when block hit the BucketCache 100%, but if the cache missed then RS need to read the block by on-heap API, which would cause high young GC pressure. This issue will read the block by offheap even if reading the block from filesystem directly, it have some requirement for hadoop version(>=2.9.3) but can also works with older hadoop version(means still works fine but will read block onheap). We have written a careful doc about the implementation, performance and practice here: https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI_E/edit#heading=h.nch5d72p27ex, for more details please read it.

      Description

      In HFileBlock#readBlockDataInternal, we have the following:

      @VisibleForTesting
      protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
          long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, boolean updateMetrics)
       throws IOException {
       // .....
        // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with BBPool (offheap).
        byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
        int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
            onDiskSizeWithHeader - preReadHeaderSize, true, offset + preReadHeaderSize, pread);
        if (headerBuf != null) {
              // ...
        }
        // ...
       }
      

      In the read path, we still read the block from hfile to on-heap byte[], then copy the on-heap byte[] to offheap bucket cache asynchronously, and in my 100% get performance test, I also observed some frequent young gc, The largest memory footprint in the young gen should be the on-heap block byte[].

      In fact, we can read HFile's block to ByteBuffer directly instead of to byte[] for reducing young gc purpose. we did not implement this before, because no ByteBuffer reading interface in the older HDFS client, but 2.7+ has supported this now, so we can fix this now. I think.

      Will provide an patch and some perf-comparison for this.

        Attachments

        1. HBASE-21879.v1.patch
          76 kB
          Zheng Hu
        2. HBASE-21879.v1.patch
          76 kB
          Zheng Hu
        3. QPS-latencies-before-HBASE-21879.png
          279 kB
          Zheng Hu
        4. gc-data-before-HBASE-21879.png
          385 kB
          Zheng Hu

          Issue Links

          1.
          Abstract an ByteBuffAllocator to allocate/free ByteBuffer in ByteBufferPool Sub-task Resolved Zheng Hu
          2.
          Make the HFileBlock#validateChecksum can accept ByteBuff as an input. Sub-task Resolved Zheng Hu
          3.
          Notify users if the ByteBufAllocator is always allocating ByteBuffers from heap which means the increacing GC pressure Sub-task Resolved Zheng Hu
          4.
          Make the Compression#decompress can accept ByteBuff as input Sub-task Resolved Zheng Hu
          5.
          Replace the byte[] pread by ByteBuffer pread in HFileBlock reading once HDFS-3246 prepared Sub-task Open Zheng Hu
          6.
          Abstract an ByteBuffOutputStream for building cell block Sub-task Resolved Zheng Hu
          7.
          Consider simplifying the logic of BucketCache eviction. Sub-task Resolved Unassigned
          8.
          Unify refCount of BucketEntry and refCount of hbase.nio.ByteBuff into one Sub-task Resolved Zheng Hu
          9.
          Use ByteBuff's refcnt to track the life cycle of data block Sub-task Resolved Zheng Hu
          10.
          Rewrite the block reading methods by using hbase.nio.ByteBuff Sub-task Resolved Zheng Hu
          11.
          The HFileBlock#CacheableDeserializer should pass ByteBuffAllocator to the newly created HFileBlock Sub-task Resolved Zheng Hu
          12.
          Ensure that the block cached in the LRUBlockCache offheap is allocated from heap Sub-task Resolved Zheng Hu
          13.
          Change to release mob hfile's block after rpc server shipped response to client Sub-task Resolved Zheng Hu
          14.
          ByteBufferIOEngine should support write off-heap ByteBuff to the bufferArray Sub-task Resolved Zheng Hu
          15.
          Remove the returnBlock method because we can just call HFileBlock#release directly Sub-task Resolved Zheng Hu
          16.
          Evaluate the get/scan performance after reading HFile block into offheap directly Sub-task Resolved Zheng Hu
          17.
          Improve the metrics in ByteBuffAllocator Sub-task Resolved Zheng Hu
          18.
          Retain an ByteBuff with refCnt=0 when getBlock from LRUCache Sub-task Resolved Zheng Hu
          19.
          Add a UT to address the HFileBlock#heapSize() in TestHeapSize Sub-task Resolved Zheng Hu
          20.
          Some paths in HFileScannerImpl did not consider block#release which will exhaust the ByteBuffAllocator Sub-task Resolved Zheng Hu
          21.
          It's better to use 65KB as the default buffer size in ByteBuffAllocator Sub-task Resolved Zheng Hu
          22.
          Separate the heap HFileBlock and offheap HFileBlock because the heap block won't need refCnt and save into prevBlocks list before shipping Sub-task Resolved Zheng Hu
          23.
          Optimize the MultiByteBuff#get(ByteBuffer, offset, len) Sub-task Resolved Zheng Hu
          24.
          The HFileReaderImpl#shouldUseHeap return the incorrect true when disabled BlockCache Sub-task Resolved Zheng Hu
          25.
          There's still too much cpu wasting on validating checksum even if buffer.size=65KB Sub-task Resolved Zheng Hu
          26.
          Align the config keys and add document for offheap read in HBase Book. Sub-task Resolved Zheng Hu
          27.
          Deprecated the hbase.ipc.server.reservoir.initial.buffer.size & hbase.ipc.server.reservoir.initial.max for HBase2.x compatibility Sub-task Resolved Zheng Hu
          28.
          Address the final overview reviewing comments of HBASE-21879 Sub-task Resolved Zheng Hu
          29.
          Backport offheap block reading (HBASE-21879) to branch-2 Sub-task Resolved Zheng Hu
          30.
          The HeapAllocationRatio in WebUI is not accurate because all of the heap allocation will happen in another separated allocator named HEAP Sub-task Resolved Zheng Hu

            Activity

              People

              • Assignee:
                openinx Zheng Hu
                Reporter:
                openinx Zheng Hu
              • Votes:
                0 Vote for this issue
                Watchers:
                21 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: