Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-21593

RU: AMS stopped after RU [AMS distributed mode]

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.5.2
    • 2.5.2
    • ambari-metrics
    • None

    Description

      PROBLEM
      When 2 metric collectors are started up simultaneously, both of them fail to start.

      BUG
      There exists a race condition in the Metric Collector HA controller initialization which was introduced through AMBARI-20179. When a helix controller instance finds that the /ambari-metrics-collector znode exists but a child node does not exists, it deletes the entire znode and recreates. If another controller instance also initializes simultaneously, a race condition can occur wherein each instance will end up cancelling the effort of the other.

      FIX
      Do not delete and recreate the znode. Wait and retry for a few seconds to check if /ambari-metrics-collector was fully initailized.

      Attachments

        1. AMBARI-21593.patch
          3 kB
          Aravindan Vijayan

        Issue Links

          Activity

            People

              avijayan Aravindan Vijayan
              avijayan Aravindan Vijayan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: