[HDDS-11341] Add Grafana dashboard for HDDS health and replication progress - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Ozone Dashboards
Labels:
None

Description

Add a Grafana dashboard to show information about datanode health, ongoing and pending replication and reconstruction tasks, and the amount of data being moved between nodes due to these tasks. This board will be useful to monitor during disk failure, node failure, node decom, and maintenance.

SCM replication manager likely has a lot of the metrics for ongoing tasks already. We may need to add more metrics to datanodes to monitor tasks that are ongoing (not just those that are queued) and the amount of data being moved. I think some datanode command queue and handler related metrics are unused as well and those can be checked/removed/updated as part of this PR.

Attachments

Issue Links

Dependent

HDDS-11376 Improve ReplicationSupervisor to record replication metrics

Resolved

is related to

HDDS-11461 Improve the impact of DataNode I/O

In Progress

HDDS-11481 Enhanced SCM Support for DataNode Management

Open

Activity

People

Assignee:: Unassigned

Reporter:: Ethan Rose

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 19/Aug/24 19:16

Updated:: 03/Oct/24 19:37