Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Redhat EL 5, Java 6
-
Reviewed
Description
Recently, new load balance algorithm was added to improve chukwa agent to chukwa collector communication. The design was to send one HTTP POST per collector, and rotate through the list of collector to load balance the collectors. When a collector fail to respond, the collector is black listed for 5 minutes. If all collectors are not responding, sleep for random 1-5 minutes. Unfortunately, this algorithm produced problem for slower machines. The slower machines end up black list all collectors and sleep indefinitely. This ticket is to restore the algorithm to the original design. The agent will shuffle the collector list. The agent will try it's best effort to make HTTP POST to the same collector until error occurs, then it will iterate through the list of random collectors.
Attachments
Attachments
Issue Links
- duplicates
-
HADOOP-4711 Chukwa - Add a config parameter to allow agent to talk to the same collector until connection fails.
-
- Resolved
-