Details
-
Epic
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
2.5.0
-
None
-
AmbariServer-Metrics
Description
Ambari's architectural design is based on having a single master server with multiple agents. Each agent sends a heartbeat every X seconds to the server to report its status; the server may reply with a list of commands to be run by each agent.
An operational cluster may have up to 2000-4000 agents and Ambari needs to be robust and performant at such scale. Often times, Ambari's overall performance is subject to the cluster’s environment like network latency and stability, Ambari database call latency, etc. In such environments, detecting the cause of the Ambari’s sluggish performance and/or instability have proven to be difficult in practice.
Ambari should intercept and store the time and resources taken for serving requests. This information can be then presented to the end user on Ambari Web and/or Grafana.
Optionally, this work can be extended to have Ambari Web persist time taken to process the response of each API call and other performance characteristics. Such performance data on Ambari Web can be again presented to the end user via Ambari Web and/or Grafana.