Details

    • Epic Name:
      Alerts Redesign

      Description

      The purpose of this umbrella JIRA is to track a new feature that enables Ambari to be the source of Alert data for a cluster.

      • "Black box" Nagios and not make it a hard dependency of Ambari.
      • Allow custom defined alerts and thresholds.
      • Flexibly define how alerts get published, and recipients of those alerts.
      • Use stacks to define baseline alerts for a cluster that can themselves be customized.

      (As design documents are completed, they will be attached to this JIRA)

      1. AlertTechDesignPublic.pdf
        380 kB
        Jonathan Hurley
      2. AlertTechDesignPublic.pdf
        356 kB
        Nate Cole

        Issue Links

          Issues in Epic

            Activity

            Hide
            ncole@hortonworks.com Nate Cole added a comment -

            Added Tech Design

            Show
            ncole@hortonworks.com Nate Cole added a comment - Added Tech Design
            Hide
            vitthal_gogate vitthal (Suhas) Gogate added a comment - - edited

            Nate, thanks for sharing the design of alert monitoring in Ambari. Here are some review comments and questions about the design,

            – For various reasons, It is very important for Ambari to support Alert monitoring through an inbuilt mechanism via ambari agents and not solely relying on the external monitoring services like Nagios. So thanks for initiating the efforts!

            – Although having a pluggable interface for Ambari to work with many existing monitoring systems like Nagios/Zabbix etc. would be really important for some users who has already invested in those monitoring systems for their other infrastructure components. Looking at current design, the approach is to have Ambari do the alert monitoring via its agents and then dispatch these alerts to other monitoring systems. This may not be ideal for some users(see some examples below).

              • Users familiar with existing Monitoring systems may want to configure all the alerts including Hadoop services, all in one place.
              • Advance alerts management features such as alert aggregation, correlation, mitigation etc. can be done uniformly across various alerts with their existing monitoring system as opposed to partly in Ambari.
              • Etc.

            – So I was wondering, if we make Ambari server internally communicate with existing monitoring systems with standard pluggable interface? This design point is orthogonal to current Ambari alert monitoring design. Ambari's alert monitoring would be one of the implementations of the pluggable monitoring interface and serve as out-of-box default implementation for Ambari managed services. Although advantage is that, it would also facilitate other users to add the implementation for their existing monitoring system.

            I can propose the initial draft of the interface for alert monitoring and see how we can get the existing Ambari alert monitoring to plug in under that interface..

            Let me know what you think?

            Show
            vitthal_gogate vitthal (Suhas) Gogate added a comment - - edited Nate, thanks for sharing the design of alert monitoring in Ambari. Here are some review comments and questions about the design, – For various reasons, It is very important for Ambari to support Alert monitoring through an inbuilt mechanism via ambari agents and not solely relying on the external monitoring services like Nagios. So thanks for initiating the efforts! – Although having a pluggable interface for Ambari to work with many existing monitoring systems like Nagios/Zabbix etc. would be really important for some users who has already invested in those monitoring systems for their other infrastructure components. Looking at current design, the approach is to have Ambari do the alert monitoring via its agents and then dispatch these alerts to other monitoring systems. This may not be ideal for some users(see some examples below). Users familiar with existing Monitoring systems may want to configure all the alerts including Hadoop services, all in one place. Advance alerts management features such as alert aggregation, correlation, mitigation etc. can be done uniformly across various alerts with their existing monitoring system as opposed to partly in Ambari. Etc. – So I was wondering, if we make Ambari server internally communicate with existing monitoring systems with standard pluggable interface? This design point is orthogonal to current Ambari alert monitoring design. Ambari's alert monitoring would be one of the implementations of the pluggable monitoring interface and serve as out-of-box default implementation for Ambari managed services. Although advantage is that, it would also facilitate other users to add the implementation for their existing monitoring system. I can propose the initial draft of the interface for alert monitoring and see how we can get the existing Ambari alert monitoring to plug in under that interface.. Let me know what you think?
            Hide
            ncole@hortonworks.com Nate Cole added a comment -

            The facility already exists to use any infrastructure they want - see individual stack components for NAGIOS service definitions. As for the refactor, we still want Ambari to do all the collection such that you could still get information about the health of your cluster without any such dependencies. Making it pluggable is certainly possible, but the initial goal is to remove the dependency of Nagios at this time.

            Show
            ncole@hortonworks.com Nate Cole added a comment - The facility already exists to use any infrastructure they want - see individual stack components for NAGIOS service definitions. As for the refactor, we still want Ambari to do all the collection such that you could still get information about the health of your cluster without any such dependencies. Making it pluggable is certainly possible, but the initial goal is to remove the dependency of Nagios at this time.
            Hide
            vitthal_gogate vitthal (Suhas) Gogate added a comment -

            I think by design it would be good to have Ambari collection of alerts be default but optional i.e. if I choose to go with Nagios, I may not want Ambari's default alert monitoring. i.e. Ambari's inbuilt alert monitoring can be treated as optional pluggable service. Is this achievable after refactor work?

            Again current refactor work to remove Nagios dependency is good but at the same time I wanted to see how can we make it pluggable design and wanted to help out along those lines.

            I am reviewing the design and code and will propose some specific changes w/ minimal disruption to exiting work as possible..

            Show
            vitthal_gogate vitthal (Suhas) Gogate added a comment - I think by design it would be good to have Ambari collection of alerts be default but optional i.e. if I choose to go with Nagios, I may not want Ambari's default alert monitoring. i.e. Ambari's inbuilt alert monitoring can be treated as optional pluggable service. Is this achievable after refactor work? Again current refactor work to remove Nagios dependency is good but at the same time I wanted to see how can we make it pluggable design and wanted to help out along those lines. I am reviewing the design and code and will propose some specific changes w/ minimal disruption to exiting work as possible..
            Hide
            keyki Krisztian Horvath added a comment -

            I didn't follow through the development so is it possible to configure WEB endpoints to call back for the dispatcher?

            Show
            keyki Krisztian Horvath added a comment - I didn't follow through the development so is it possible to configure WEB endpoints to call back for the dispatcher?
            Hide
            jonathan.hurley Jonathan Hurley added a comment -

            WEB style alerts are used to test an endpoint that serves HTTP/HTTPS. Once you have created the alert to check your web endpoint, you can associate any number of alert targets (dispatchers) with that alert. Targets support SNMP and SMTP currently. This allows us to decouple the type of alert from the notification mechanism.

            Show
            jonathan.hurley Jonathan Hurley added a comment - WEB style alerts are used to test an endpoint that serves HTTP/HTTPS. Once you have created the alert to check your web endpoint, you can associate any number of alert targets (dispatchers) with that alert. Targets support SNMP and SMTP currently. This allows us to decouple the type of alert from the notification mechanism.
            Hide
            keyki Krisztian Horvath added a comment -

            Thanks for the quick reply. I see. Is there a JIRA for web notifications or it is not planned at all? Our use case is to scale the cluster based on different type of metrics.

            Show
            keyki Krisztian Horvath added a comment - Thanks for the quick reply. I see. Is there a JIRA for web notifications or it is not planned at all? Our use case is to scale the cluster based on different type of metrics.
            Hide
            jonathan.hurley Jonathan Hurley added a comment -

            I'm not clear on exactly what you're asking about. Can you explain a bit more what you mean when you say "web notifications"?

            Show
            jonathan.hurley Jonathan Hurley added a comment - I'm not clear on exactly what you're asking about. Can you explain a bit more what you mean when you say "web notifications"?
            Hide
            keyki Krisztian Horvath added a comment -

            As you said currently SNMP and SMTP are supported for alert notifications. As I quickly checked the code I would need to implement the NotificationDispatcher to have a dispatcher which is capable to call back to target endpoints if an alert triggers. Let's say send a POST request to a target endpoint. Similar to Amazon's SNS topic. Use case: HDFS disk space reaches 90% call back to an URL with the alert details and it would trigger and event in a 3rd party application to act on it.

            Show
            keyki Krisztian Horvath added a comment - As you said currently SNMP and SMTP are supported for alert notifications. As I quickly checked the code I would need to implement the NotificationDispatcher to have a dispatcher which is capable to call back to target endpoints if an alert triggers. Let's say send a POST request to a target endpoint. Similar to Amazon's SNS topic. Use case: HDFS disk space reaches 90% call back to an URL with the alert details and it would trigger and event in a 3rd party application to act on it.
            Hide
            jonathan.hurley Jonathan Hurley added a comment -

            Thanks for clarifying that. You are correct in that you'd need to implement the NotificationDispatcher interface and provide your own functionality for this. In Ambari 2.0.0, the only supported dispatchers are the two I mentioned; SNMP and SMTP. However, for Ambari 2.1.0, we plan to expose a way to pickup custom dispatchers from the classpath.

            Show
            jonathan.hurley Jonathan Hurley added a comment - Thanks for clarifying that. You are correct in that you'd need to implement the NotificationDispatcher interface and provide your own functionality for this. In Ambari 2.0.0, the only supported dispatchers are the two I mentioned; SNMP and SMTP. However, for Ambari 2.1.0, we plan to expose a way to pickup custom dispatchers from the classpath.

              People

              • Assignee:
                ncole@hortonworks.com Nate Cole
                Reporter:
                ncole@hortonworks.com Nate Cole
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Development