Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
After HADOOP-12317, kill command's execution will be failure under Ubuntu12. After NM restarts, it cannot get if a process is alive or not via pid of containers, and it cannot kill process correctly when RM/AM tells NM to kill a container.
Logs from NM (customized logs):
2015-09-25 21:58:59,348 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:containerIsAlive(431)) - ================== check alive cmd:[[Ljava.lang.String;@496e442d] 2015-09-25 21:58:59,349 INFO nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) - USER=hrt_qa IP=10.0.1.14 OPERATION=Stop Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1443218269460_0001 CONTAINERID=container_1443218269460_0001_01_000001 2015-09-25 21:58:59,363 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:containerIsAlive(438)) - =========================== ExitCodeException exitCode=1: ERROR: garbage process ID "--". Usage: kill pid ... Send SIGTERM to every process listed. kill signal pid ... Send a signal to every process listed. kill -s signal pid ... Send a signal to every process listed. kill -l List all signal names. kill -L List all signal names in a nice table. kill -l signal Convert between signal numbers and names. at org.apache.hadoop.util.Shell.runCommand(Shell.java:550) at org.apache.hadoop.util.Shell.run(Shell.java:461) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:727) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.containerIsAlive(DefaultContainerExecutor.java:432) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.signalContainer(DefaultContainerExecutor.java:401) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:419) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108) at java.lang.Thread.run(Thread.java:745)
Attachments
Attachments
Issue Links
- breaks
-
HADOOP-12463 TestShell.testGetSignalKillCommand failing on windows
- Open
-
HADOOP-13770 Shell.checkIsBashSupported swallowed an interrupted exception
- Resolved
- is broken by
-
HADOOP-12317 Applications fail on NM restart on some linux distro because NM container recovery declares AM container as LOST
- Resolved
- relates to
-
HADOOP-13467 Shell#getSignalKillCommand should use the bash builtin on Linux
- Resolved