Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
Soft failure detection exists in Helix, but we can do better. This task tracks the design of a new health monitoring system that we can leverage to find issues even faster.
Design document: https://cwiki.apache.org/confluence/display/HELIX/Helix+Monitoring+Design
Attachments
1.
|
Write a monitoring entry point that leverages the container module | Open | Unassigned | |
2.
|
Run stress tests on Riemann | Resolved | Kanak Biscuitwala | |
3.
|
Start Riemann as a thread within an existing Java process | Resolved | Kanak Biscuitwala | |
4.
|
Manage monitoring configs for Helix and apps | Resolved | Kanak Biscuitwala | |
5.
|
Write a clojure API for Helix/Riemann events | Open | Unassigned | |
6.
|
Define alert classes for Helix monitoring | Open | Unassigned | |
7.
|
Handle alerts in controller | Open | Unassigned | |
8.
|
Manage monitoring client connections | Open | Unassigned | |
9.
|
Create a monitoring client API | Open | Unassigned | |
10.
|
Write some basic health-based alert configs | Open | Unassigned | |
11.
|
Consider removing ZKPropertyTransferServer | Closed | Unassigned |