[HDFS-5664] try to relieve the BlockReaderLocal read() synchronized hotspot - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.2.0, 3.0.0-alpha1
Fix Version/s: None
Component/s: hdfs-client
Labels:
None

Description

Current the BlockReaderLocal's read has a synchronized modifier:

public synchronized int read(byte[] buf, int off, int len) throws IOException {

In a HBase physical read heavy cluster, we observed some hotspots from dfsclient path, the detail strace trace could be found from: https://issues.apache.org/jira/browse/HDFS-1605?focusedCommentId=13843241&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13843241

I haven't looked into the detail yet, put some raw ideas here firstly:
1) replace synchronized with try lock with timeout pattern, so could fail-fast, 2) fallback to non-ssr mode if get a local reader lock failed.
There're two suitable scenario at least to remove this hotspot:
1) Local physical read heavy, e.g. HBase block cache miss ratio is high
2) slow/bad disk.
It would be helpful to achive a lower 99th percentile HBase read latency somehow.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Liang Xie

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 13/Dec/13 04:28

Updated:: 12/Jun/22 00:00

Resolved:: 12/Jun/22 00:00