[HDFS-14535] The default 8KB buffer in requestFileDescriptors#BufferedOutputStream is causing lots of heap allocation in HBase when using short-circut read - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.10.0, 3.0.4, 3.3.0, 2.8.6, 3.2.1, 2.9.3, 3.1.3
Component/s: hdfs-client
Labels:
None

Hadoop Flags:

Reviewed

Description

Our HBase team are trying to read the blocks from HDFS into pooled offheap ByteBuffers directly (~~HBASE-21879~~), and recently we had some benchmark, found that almost 45% heap allocation from the DFS client. The heap allocation flame graph can be see here: https://issues.apache.org/jira/secure/attachment/12970295/async-prof-pid-25042-alloc-2.svg

After checking the code path, we found that when requesting file descriptors from a DomainPeer, we allocated huge 8KB buffer for BufferedOutputStream, though the protocal content was quite small and just few bytes.

It made a heavy GC pressure for HBase when cacheHitRatio < 60%, which increased the HBase P999 latency. Actually, we can pre-allocate a small buffer for the BufferedOutputStream, such as 512 bytes, it's enough to read the short-circuit fd protocal content. we've created a patch like that, and the allocation flame graph show that after the patch, the heap allocation from DFS client dropped from 45% to 27%, that's a very good thing I think. see: https://issues.apache.org/jira/secure/attachment/12970475/async-prof-pid-24534-alloc-2.svg

Hope this attached patch can be merged into HDFS trunk, also Hadoop-2.8.x, HBase will benifit a lot from this.

Thanks.

For more details, can see here: https://issues.apache.org/jira/browse/HBASE-22387?focusedCommentId=16851639&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16851639

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-14535.patch
01/Jun/19 04:02
2 kB
Zheng Hu

Issue Links

is related to

HDFS-14820 The default 8KB buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream is too big

Resolved

relates to

HBASE-22387 Evaluate the get/scan performance after reading HFile block into offheap directly

Closed

HBASE-21879 Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

Closed

links to

GitHub Pull Request #899

Activity

People

Assignee:: Zheng Hu

Reporter:: Zheng Hu

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 01/Jun/19 04:01

Updated:: 13/Aug/20 09:15

Resolved:: 17/Jun/19 14:49