Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
On a debian machine we have seen node manager recovery of containers fail because the signal syntax for process group may not work. We see errors in checking if process is alive during container recovery which causes the container to be declared as LOST (154) on a NodeManager restart.
The application will fail with error. The attempts are not retried.
Application application_1439244348718_0001 failed 1 times due to Attempt recovered after RM restartAM Container for appattempt_1439244348718_0001_000001 exited with exitCode: 154
Attachments
Attachments
Issue Links
- breaks
-
HADOOP-12441 Fix kill command behavior under some Linux distributions.
- Resolved
-
YARN-3561 Non-AM Containers continue to run even after AM is stopped
- Resolved
- duplicates
-
YARN-3561 Non-AM Containers continue to run even after AM is stopped
- Resolved
- supercedes
-
HADOOP-11989 Kill command for process group id throws ExitCodeException
- Resolved