Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-19416

Ambari agents remain in heartbeat lost state after ambari server restart

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.5.0
    • None
    • None

    Description

      With the implementation https://issues.apache.org/jira/browse/AMBARI-18505 the execution of status commands is done in a separate child process. Status commands received from the server by ambari agent are passed to the status command executor child process via Queue (multiprocessing.Queue(). In case the child process is killed, either manually or by the parent process the queue may end up in bad state (see: http://bugs.python.org/issue20527) thus the re-spawned status command executor child process may not receive new status commands any more.

      When ambari server is restarted the agent re-registers with ambari server and upon re-registration it re-spawns the status command child process in order to receive up to date agent configs (https://issues.apache.org/jira/browse/AMBARI-19392). In this case the status commands won't be received by the status command executor child process due the queue may get stuck leading the ambari agent to stay in heatbeat lost state.

      Attachments

        1. AMBARI-19416.v3.patch
          6 kB
          Sebastian Toader

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            stoader Sebastian Toader
            stoader Sebastian Toader
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Issue deployment