Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-11341

Add Grafana dashboard for HDDS health and replication progress

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Ozone Dashboards
    • None

    Description

      Add a Grafana dashboard to show information about datanode health, ongoing and pending replication and reconstruction tasks, and the amount of data being moved between nodes due to these tasks. This board will be useful to monitor during disk failure, node failure, node decom, and maintenance.

      SCM replication manager likely has a lot of the metrics for ongoing tasks already. We may need to add more metrics to datanodes to monitor tasks that are ongoing (not just those that are queued) and the amount of data being moved. I think some datanode command queue and handler related metrics are unused as well and those can be checked/removed/updated as part of this PR.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              erose Ethan Rose
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: