Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-19416

Ambari agents remain in heartbeat lost state after ambari server restart

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.5.0
    • None
    • None

    Description

      With the implementation https://issues.apache.org/jira/browse/AMBARI-18505 the execution of status commands is done in a separate child process. Status commands received from the server by ambari agent are passed to the status command executor child process via Queue (multiprocessing.Queue(). In case the child process is killed, either manually or by the parent process the queue may end up in bad state (see: http://bugs.python.org/issue20527) thus the re-spawned status command executor child process may not receive new status commands any more.

      When ambari server is restarted the agent re-registers with ambari server and upon re-registration it re-spawns the status command child process in order to receive up to date agent configs (https://issues.apache.org/jira/browse/AMBARI-19392). In this case the status commands won't be received by the status command executor child process due the queue may get stuck leading the ambari agent to stay in heatbeat lost state.

      Attachments

        1. AMBARI-19416.v3.patch
          6 kB
          Sebastian Toader

        Issue Links

          Activity

            People

              stoader Sebastian Toader
              stoader Sebastian Toader
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: