[AMBARI-8569] Alert JSON Files Need Descriptions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 2.0.0
Component/s: alerts
Labels:
None

Epic Link:
Alerts Redesign

Description

BUG-28018 adds a new description field to an alert definition. The alerts.json files for every service in every stack should be updated to have this field for each alert definition.

DateNode Process	HDFS	This host-level alert is triggered if the individual DataNode processes cannot be established to be up and listening on the network for the configured critical threshold.
NameNode Process	HDFS	This host-level alert is triggered if the NameNode process cannot be confirmed to be up and listening on the network for the configured critical threshold.
NameNode Host CPU Utilization	HDFS	This host-level alert is triggered if CPU utilization of the NameNode exceeds certain warning and critical thresholds. It checks the NameNode JMX Servlet for the SystemCPULoad property.
NameNode Blocks Health	HDFS	This service-level alert is triggered if the number of corrupt or missing blocks exceeds the configured critical threshold.
DataNode Storage	HDFS	This host-level alert is triggered if storage capacity if full on the DataNode. It checks the DataNode JMX Servlet for the Capacity and Remaining properties.
NameNode Web UI	HDFS	This host-level alert is triggered if the NameNode Web UI is unreachable.
Percent DataNodes With Available Space	HDFS	This service-level alert is triggered if the storage if full on a certain percentage of DataNodes exceed the warning and critical thresholds.
Percent DataNodes Available	HDFS	This alert is triggered if the number of down DataNodes in the cluster is greater than the configured critical threshold. It aggregates the results of DataNode process checks.
NameNode RPC Latency	HDFS	his host-level alert is triggered if the NameNode operations RPC latency exceeds the configured critical threshold. Typically an increase in the RPC processing time increases the RPC queue length, causing the average queue wait time to increase for NameNode operations.
HDFS Capacity Utilization	HDFS	This service-level alert is triggered if the HDFS capacity utilization exceeds the configured warning and critical thresholds. It checks the NameNode JMX Servlet for the CapacityUsed and CapacityRemaining properties.
DataNode Web UI	HDFS	This host-level alert is triggered if the DataNode Web UI is unreachable.
Secondary NameNode Process	HDFS	This host-level alert is triggered if the Secondary NameNode process cannot be confirmed to be up and listening on the network for the configured critical threshold.
JournalNode Process	HDFS	This host-level alert is triggered if the JournalNode process cannot be confirmed to be up and listening on the network for the configured critical threshold.
ZooKeeper Failover Controller Process	HDFS	This host-level alert is triggered if the ZooKeeper Failover Controller process cannot be confirmed to be up and listening on the network for the configured critical threshold.
Percent JournalNodes Available	HDFS	This alert is triggered if the number of down JournalNodes in the cluster is greater than the configured critical threshold. It aggregates the results of JournalNode process checks.
NameNode High Availability Health	HDFS	This service-level alert is triggered if either the Active NameNode or Standby NameNode are not running.
History Server Process	MAPREDUCE2	This host-level alert is triggered if the HistoryServer process cannot be established to be up and listening on the network for the configured critical threshold
History Server RPC Latency	MAPREDUCE2	This host-level alert is triggered if the HistoryServer operations RPC latency exceeds the configured critical threshold. Typically an increase in the RPC processing time increases the RPC queue length, causing the average queue wait time to increase for operations.
History Server CPU Utilization	MAPREDUCE2	This host-level alert is triggered if the percent of CPU utilization on the HistoryServer exceeds the configured critical threshold.
History Server Web UI	MAPREDUCE2	This host-level alert is triggered if the HistoryServer Web UI is unreachable.
ZooKeeper Server Process	ZOOKEEPER	This host-level alert is triggered if the ZooKeeper server process cannot be determined to be up and listening on the network for the configured critical threshold.
Percent ZooKeeper Servers Available	ZOOKEEPER	This service-level alert is triggered if the configured percentage of ZooKeeper processes cannot be determined to be up and listening on the network for the configured critical threshold. It aggregates the results of ZooKeeper process checks.
ResourceManager RPC Latency	YARN	This host-level alert is triggered if the ResourceManager operations RPC latency exceeds the configured critical threshold. Typically an increase in the RPC processing time increases the RPC queue length, causing the average queue wait time to increase for ResourceManager operations.
ResourceManager CPU Utilization	YARN	This host-level alert is triggered if CPU utilization of the ResourceManager exceeds certain warning and critical thresholds. It checks the ResourceManager JMX Servlet for the SystemCPULoad property.
NodeManager Health	YARN	This host-level alert checks the node health property available from the NodeManager component.
Percent NodeManagers Available	YARN	This alert is triggered if the number of down NodeManagers in the cluster is greater than the configured critical threshold. It aggregates the results of NodeManager process checks.
ResourceManager Web UI	YARN	This host-level alert is triggered if the ResourceManager Web UI is unreachable.
App Timeline Web UI	YARN	This host-level alert is triggered if the App Timeline Server Web UI is unreachable.
NodeManager Web UI	YARN	This host-level alert is triggered if the NodeManager Web UI is unreachable.
NameNode Last Checkpoint	HDFS	Checks the last time that the NameNode performed a checkpoint. This script will also check for the number of uncommitted transactions.
NameNode Directory Status	HDFS	It checks the NameNode JMX Servlet for the NameDirStatuses metric to see if any directories report a failure.

Percent RegionServers process	HBASE	This service-level alert is triggered if the configured percentage of Region Server processes cannot be determined to be up and listening on the network for the configured warning and critical thresholds. It aggregates the results of RegionServer process down checks.
Percent HBase Master process	HBASE	This alert is triggered if the HBase master processes cannot be confirmed to be up and listening on the network for the configured critical threshold, given in seconds.
HBase Master Web UI	HBASE	This host-level alert is triggered if the HBase Master Web UI is unreachable.
Percent HBase Master CPU utilization	HBASE	This host-level alert is triggered if CPU utilization of the HBase Master exceeds certain warning and critical thresholds. It checks the HBase Master JMX Servlet for the SystemCPULoad property.
RegionServer process	HBASE	This host-level alert is triggered if the RegionServer processes cannot be confirmed to be up and listening on the network for the configured critical threshold, given in seconds.

Hive Metastore status	HIVE	This host-level alert is triggered if the Hive Metastore process cannot be determined to be up and listening on the network for the configured critical threshold.
WebHCat Server process	HIVE	This host-level alert is triggered if the WebHCat server cannot be determined to be up and responding to client requests.

Oozie Server process

OOZIE

This host-level alert is triggered if the Oozie server cannot be determined to be up and responding to client requests.

Knox Gateway process	KNOX	This host-level alert is triggered if the Knox Gateway cannot be determined to be up.
Kafka Broker process	KAFKA	This host-level alert is triggered if the Kafka Broker cannot be determined to be up.

Falcon Server Web UI	FALCON	This host-level alert is triggered if the Falcon Server Web UI is unreachable.
Falcon Server process UI	FALCON	This host-level alert is triggered if the Falcon Server cannot be determined to be up.

Attachments

Activity

People

Assignee:: Jonathan Hurley

Reporter:: Jonathan Hurley

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 05/Dec/14 19:34

Updated:: 08/Dec/14 17:18

Resolved:: 08/Dec/14 16:20