Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
Description
With the implementation https://issues.apache.org/jira/browse/AMBARI-18505 the execution of status commands is done in a separate child process. Status commands received from the server by ambari agent are passed to the status command executor child process via Queue (multiprocessing.Queue(). In case the child process is killed, either manually or by the parent process the queue may end up in bad state (see: http://bugs.python.org/issue20527) thus the re-spawned status command executor child process may not receive new status commands any more.
When ambari server is restarted the agent re-registers with ambari server and upon re-registration it re-spawns the status command child process in order to receive up to date agent configs (https://issues.apache.org/jira/browse/AMBARI-19392). In this case the status commands won't be received by the status command executor child process due the queue may get stuck leading the ambari agent to stay in heatbeat lost state.
Attachments
Attachments
Issue Links
- links to