Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-1880 Decommissioning and maintenance mode in Ozone
  3. HDDS-2113

Update JMX metrics for node count in SCMNodeMetrics for Decommission and Maintenance

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.5.0
    • None
    • SCM

    Description

      Currently the class SCMNodeMetrics exposes JMX metrics for the number of HEALTHY, STALE and DEAD nodes.

      It also exposes the disk capacity of the cluster and the amount of space used and available.

      We need to decide how we want to display things in JMX when nodes are in and entering maintenance, decommissioning and decommissioned.

      We now have 15 states rather than the previous 3, as we can have nodes in:

      • IN_SERVICE
      • ENTERING_MAINTENANCE
      • IN_MAINTENANCE
      • DECOMMISSIONING
      • DECOMMISSIONED

      And in each of these states, nodes can be:

      • HEALTHY
      • STALE
      • DEAD

      The simplest case would be to expose these 15 states directly in JMX, as it gives the complete picture, but I wonder if we need any summary JMX metrics too?

       

      We also need to consider how to count disk capacity and usage. For example:

      1. Do we count capacity and usage on a DECOMMISSIONING node? This is not a clear cut answer, as a decommissioning node does not provide any capacity for writers in the cluster, but it does use capacity.
      2. For a DECOMMISSIONED node, we probably should not count capacity or usage
      3. For an ENTERING_MAINTENANCE node, do we count capacity and usage? I suspect we should include the capacity and usage in the totals, however a node in this state will not be available for writes.
      4. For an IN_MAINTENANCE node that is healthy?
      5. For an IN_MAINTENANCE node that is dead?

      I would welcome any thoughts on this before changing the code.

      Attachments

        Issue Links

          Activity

            People

              sodonnell Stephen O'Donnell
              sodonnell Stephen O'Donnell
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m