Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Add a Grafana dashboard to show information about datanode health, ongoing and pending replication and reconstruction tasks, and the amount of data being moved between nodes due to these tasks. This board will be useful to monitor during disk failure, node failure, node decom, and maintenance.
SCM replication manager likely has a lot of the metrics for ongoing tasks already. We may need to add more metrics to datanodes to monitor tasks that are ongoing (not just those that are queued) and the amount of data being moved. I think some datanode command queue and handler related metrics are unused as well and those can be checked/removed/updated as part of this PR.
Attachments
Issue Links
- Dependent
-
HDDS-11376 Improve ReplicationSupervisor to record replication metrics
- Resolved
- is related to
-
HDDS-11461 Improve the impact of DataNode I/O
- In Progress
-
HDDS-11481 Enhanced SCM Support for DataNode Management
- Open