Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-15303

New Alerts Do Not Honor Existing Maintenance Mode Setting

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.0.0
    • 2.2.0
    • ambari-server
    • None

    Description

      Alerts "suppress" maintenance mode by indicating a maintenance_state attribute in addition to the actual state which is being reported:

            "Alert": {
              "cluster_name": "c1",
              "component_name": "METRICS_COLLECTOR",
              "definition_id": 43,
              "definition_name": "ams_metrics_collector_process",
              "host_name": "c6401.ambari.apache.org",
              "id": 28,
              "instance": null,
              "label": "Metrics Collector Process",
              "latest_timestamp": 1457108946118,
              "maintenance_state": "ON",
              "original_timestamp": 1457108646099,
              "scope": "ANY",
              "service_name": "AMBARI_METRICS",
              "state": "CRITICAL",
              "text": "Connection failed: [Errno 111] Connection refused to c6401.ambari.apache.org"
            }
      

      When a host/service/component is placed into MM, the database is updated so that all alert_current rows which are affected have their MM updated as well.

      However, this fails under two scenarios:

      • The alert hasn't been received yet in a brand new cluster
      • The alert definition was disabled, which removed all current alerts. Then, it was re-enabled.

      In both cases, when constructing a new AlertCurrentEntity, we need to calculate the correct maintenance state.

      Attachments

        1. AMBARI-15303.patch
          16 kB
          Jonathan Hurley

        Issue Links

          Activity

            People

              jonathanhurley Jonathan Hurley
              jonathanhurley Jonathan Hurley
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: