Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Soft failure detection exists in Helix, but we can do better. This task tracks the design of a new health monitoring system that we can leverage to find issues even faster.
Design document: https://cwiki.apache.org/confluence/display/HELIX/Helix+Monitoring+Design
Attachments
1.
|
Write a monitoring entry point that leverages the container module |
|
Open | Unassigned |
2.
|
Write a clojure API for Helix/Riemann events |
|
Open | Unassigned |
3.
|
Define alert classes for Helix monitoring |
|
Open | Unassigned |
4.
|
Handle alerts in controller |
|
Open | Unassigned |
5.
|
Manage monitoring client connections |
|
Open | Unassigned |
6.
|
Create a monitoring client API |
|
Open | Unassigned |
7.
|
Write some basic health-based alert configs |
|
Open | Unassigned |