Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-25460

Expose drainingServers as cluster metric

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Exposed new jmx metrics: "draininigRegionServers" and "numDrainingRegionServers" to provide "comma separated names for regionservers that are put in draining mode" and "num of such regionservers" respectively.

      Description

      Due to some reason, we had significantly high number of servers put in decommissioned mode and for significant time, they continued being in the same state serving no regions at all. This put heavy load on rest of live servers and it was too late before one could recognize the issues with improper balancing of the cluster. And as expected, balancing such cluster with/without runMaxSteps can bring up sudden spike of RITs in proportion to the degree of imbalanced regions in the cluster.

      Although running into such situation is rare, we can take some precautions by exposing metric. We should expose list of draining RegionServers as jmx metrics just like we expose liveRegionServers and deadRegionServers. Such metric can help configure alerts with threshold on % of total RS that are allowed to go in draining mode (e.g during rolling upgrades) in any circumstances.

        Attachments

          Activity

            People

            • Assignee:
              rkrahul324 Rahul Kumar
              Reporter:
              vjasani Viraj Jasani

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment