[HDDS-11461] Improve the impact of DataNode I/O - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: In Progress
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Ozone Datanode
Labels:
None

Description

Our object storage service is built on Ozone and currently has over 3K nodes across different clusters. Customers have high demands for the P99 latency of our system access.

Under normal circumstances, reading 200 bytes of data might take 10ms to 20ms. However, monitoring data sometimes shows that reading 200 bytes can take up to 500ms.

Upon investigating the issue with the DN, we find that when the machine hosting the DN experiences high I/O wait or system load, the performance of DN access is adversely affected.

The factors contributing to high I/O wait or system load are diverse, including DataScanner scans, EC block recovery, or containers being in an UnderReplicated state.

We aim to design a mechanism that allows DN to sense the system's I/O conditions to some extent (such as high system load, high I/O wait, slow network, or slow disk) and report this data to the SCM.

This data will be used to enhance system functionality:

When a DN detects high I/O or degraded read/write performance:

Automatically reduce the rate of DataScanner scans.
If a specific disk's performance deteriorates, skip that disk during data writes.

When the SCM detects high I/O or degraded read/write performance on DNs:

Issue commands to bypass these poorly performing DNs.
When returning a list of DNs to clients for data reads, place the degraded DNs at the end of the list.

Attachments

Issue Links

relates to

HDDS-10712 Datanode Chunk, Block and Volume IO Dashboard

Open

HDDS-11341 Add Grafana dashboard for HDDS health and replication progress

Open

Sub-Tasks

1.	Enhancing DataNode I/O Monitoring Capabilities	Resolved	Shilun Fan
2.	Track and display failed DataNode storage locations in SCM.	In Progress	Shilun Fan
3.	Collect iowait and system on the node.	In Progress	Shilun Fan

Activity

People

Assignee:: Shilun Fan

Reporter:: Shilun Fan

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/Sep/24 03:34

Updated:: 16/Sep/24 18:01