[HDFS-16521] DFS API to retrieve slow datanodes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.4.0, 3.3.5
Fix Version/s: 3.4.0, 3.3.5
Component/s: datanode, dfsclient
Labels:
- pull-request-available

Target Version/s:

3.4.0, 3.3.5
Hadoop Flags:

Reviewed

Description

Providing DFS API to retrieve slow nodes would help add an additional option to "dfsadmin -report" that lists slow datanodes info for operators to take a look, specifically useful filter for larger clusters.

The other purpose of such API is for HDFS downstreamers without direct access to namenode http port (only rpc port accessible) to retrieve slownodes.

Moreover, FanOutOneBlockAsyncDFSOutput in HBase currently has to rely on it's own way of marking and excluding slow nodes while 1) creating pipelines and 2) handling ack, based on factors like the data length of the packet, processing time with last ack timestamp, whether flush to replicas is finished etc. If it can utilize slownode API from HDFS to exclude nodes appropriately while writing block, a lot of it's own post-ack computation of slow nodes can be saved or improved or based on further experiment, we could find better solution to manage slow node detection logic both in HDFS and HBase. However, in order to collect more data points and run more POC around this area, HDFS should provide API for downstreamers to efficiently utilize slownode info for such critical low-latency use-case (like writing WALs).

Attachments

Issue Links

causes

HDFS-17017 Fix the issue of arguments number limit in report command in DFSAdmin.

Resolved

relates to

HDFS-16582 Expose aggregate latency of slow node as perceived by the reporting node

Resolved

HDFS-16595 Slow peer metrics - add median, mad and upper latency limits

Resolved

links to

GitHub Pull Request #4107

GitHub Pull Request #4259

Activity

People

Assignee:: Viraj Jasani

Reporter:: Viraj Jasani

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 25/Mar/22 16:26

Updated:: 27/Jan/24 01:28

Resolved:: 05/May/22 20:56

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

7h 10m