Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
2.5.2
-
None
Description
PROBLEM
When 2 metric collectors are started up simultaneously, both of them fail to start.
BUG
There exists a race condition in the Metric Collector HA controller initialization which was introduced through AMBARI-20179. When a helix controller instance finds that the /ambari-metrics-collector znode exists but a child node does not exists, it deletes the entire znode and recreates. If another controller instance also initializes simultaneously, a race condition can occur wherein each instance will end up cancelling the effort of the other.
FIX
Do not delete and recreate the znode. Wait and retry for a few seconds to check if /ambari-metrics-collector was fully initailized.
Attachments
Attachments
Issue Links
- links to