Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-8569

Alert JSON Files Need Descriptions

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • alerts
    • None

    Description

      BUG-28018 adds a new description field to an alert definition. The alerts.json files for every service in every stack should be updated to have this field for each alert definition.

      DateNode Process HDFS This host-level alert is triggered if the individual DataNode processes cannot be established to be up and listening on the network for the configured critical threshold.
      NameNode Process HDFS This host-level alert is triggered if the NameNode process cannot be confirmed to be up and listening on the network for the configured critical threshold.
      NameNode Host CPU Utilization HDFS This host-level alert is triggered if CPU utilization of the NameNode exceeds certain warning and critical thresholds. It checks the NameNode JMX Servlet for the SystemCPULoad property.
      NameNode Blocks Health HDFS This service-level alert is triggered if the number of corrupt or missing blocks exceeds the configured critical threshold.
      DataNode Storage HDFS This host-level alert is triggered if storage capacity if full on the DataNode. It checks the DataNode JMX Servlet for the Capacity and Remaining properties.
      NameNode Web UI HDFS This host-level alert is triggered if the NameNode Web UI is unreachable.
      Percent DataNodes With Available Space HDFS This service-level alert is triggered if the storage if full on a certain percentage of DataNodes exceed the warning and critical thresholds.
      Percent DataNodes Available HDFS This alert is triggered if the number of down DataNodes in the cluster is greater than the configured critical threshold. It aggregates the results of DataNode process checks.
      NameNode RPC Latency HDFS his host-level alert is triggered if the NameNode operations RPC latency exceeds the configured critical threshold. Typically an increase in the RPC processing time increases the RPC queue length, causing the average queue wait time to increase for NameNode operations.
      HDFS Capacity Utilization HDFS This service-level alert is triggered if the HDFS capacity utilization exceeds the configured warning and critical thresholds. It checks the NameNode JMX Servlet for the CapacityUsed and CapacityRemaining properties.
      DataNode Web UI HDFS This host-level alert is triggered if the DataNode Web UI is unreachable.
      Secondary NameNode Process HDFS This host-level alert is triggered if the Secondary NameNode process cannot be confirmed to be up and listening on the network for the configured critical threshold.
      JournalNode Process HDFS This host-level alert is triggered if the JournalNode process cannot be confirmed to be up and listening on the network for the configured critical threshold.
      ZooKeeper Failover Controller Process HDFS This host-level alert is triggered if the ZooKeeper Failover Controller process cannot be confirmed to be up and listening on the network for the configured critical threshold.
      Percent JournalNodes Available HDFS This alert is triggered if the number of down JournalNodes in the cluster is greater than the configured critical threshold. It aggregates the results of JournalNode process checks.
      NameNode High Availability Health HDFS This service-level alert is triggered if either the Active NameNode or Standby NameNode are not running.
      History Server Process MAPREDUCE2 This host-level alert is triggered if the HistoryServer process cannot be established to be up and listening on the network for the configured critical threshold
      History Server RPC Latency MAPREDUCE2 This host-level alert is triggered if the HistoryServer operations RPC latency exceeds the configured critical threshold. Typically an increase in the RPC processing time increases the RPC queue length, causing the average queue wait time to increase for operations.
      History Server CPU Utilization MAPREDUCE2 This host-level alert is triggered if the percent of CPU utilization on the HistoryServer exceeds the configured critical threshold.
      History Server Web UI MAPREDUCE2 This host-level alert is triggered if the HistoryServer Web UI is unreachable.
      ZooKeeper Server Process ZOOKEEPER This host-level alert is triggered if the ZooKeeper server process cannot be determined to be up and listening on the network for the configured critical threshold.
      Percent ZooKeeper Servers Available ZOOKEEPER This service-level alert is triggered if the configured percentage of ZooKeeper processes cannot be determined to be up and listening on the network for the configured critical threshold. It aggregates the results of ZooKeeper process checks.
      ResourceManager RPC Latency YARN This host-level alert is triggered if the ResourceManager operations RPC latency exceeds the configured critical threshold. Typically an increase in the RPC processing time increases the RPC queue length, causing the average queue wait time to increase for ResourceManager operations.
      ResourceManager CPU Utilization YARN This host-level alert is triggered if CPU utilization of the ResourceManager exceeds certain warning and critical thresholds. It checks the ResourceManager JMX Servlet for the SystemCPULoad property.
      NodeManager Health YARN This host-level alert checks the node health property available from the NodeManager component.
      Percent NodeManagers Available YARN This alert is triggered if the number of down NodeManagers in the cluster is greater than the configured critical threshold. It aggregates the results of NodeManager process checks.
      ResourceManager Web UI YARN This host-level alert is triggered if the ResourceManager Web UI is unreachable.
      App Timeline Web UI YARN This host-level alert is triggered if the App Timeline Server Web UI is unreachable.
      NodeManager Web UI YARN This host-level alert is triggered if the NodeManager Web UI is unreachable.
      NameNode Last Checkpoint HDFS Checks the last time that the NameNode performed a checkpoint. This script will also check for the number of uncommitted transactions.
      NameNode Directory Status HDFS It checks the NameNode JMX Servlet for the NameDirStatuses metric to see if any directories report a failure.
      Percent RegionServers process HBASE This service-level alert is triggered if the configured percentage of Region Server processes cannot be determined to be up and listening on the network for the configured warning and critical thresholds. It aggregates the results of RegionServer process down checks.
      Percent HBase Master process HBASE This alert is triggered if the HBase master processes cannot be confirmed to be up and listening on the network for the configured critical threshold, given in seconds.
      HBase Master Web UI HBASE This host-level alert is triggered if the HBase Master Web UI is unreachable.
      Percent HBase Master CPU utilization HBASE This host-level alert is triggered if CPU utilization of the HBase Master exceeds certain warning and critical thresholds. It checks the HBase Master JMX Servlet for the SystemCPULoad property.
      RegionServer process HBASE This host-level alert is triggered if the RegionServer processes cannot be confirmed to be up and listening on the network for the configured critical threshold, given in seconds.
      Hive Metastore status HIVE This host-level alert is triggered if the Hive Metastore process cannot be determined to be up and listening on the network for the configured critical threshold.
      WebHCat Server process HIVE This host-level alert is triggered if the WebHCat server cannot be determined to be up and responding to client requests.
      Oozie Server process OOZIE This host-level alert is triggered if the Oozie server cannot be determined to be up and responding to client requests.
      Knox Gateway process KNOX This host-level alert is triggered if the Knox Gateway cannot be determined to be up.
      Kafka Broker process KAFKA This host-level alert is triggered if the Kafka Broker cannot be determined to be up.
      Falcon Server Web UI FALCON This host-level alert is triggered if the Falcon Server Web UI is unreachable.
      Falcon Server process UI FALCON This host-level alert is triggered if the Falcon Server cannot be determined to be up.

      Attachments

        Activity

          People

            jonathanhurley Jonathan Hurley
            jonathanhurley Jonathan Hurley
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: