Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12441

Fix kill command behavior under some Linux distributions.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.8.0, 3.0.0-alpha1
    • None
    • None
    • Reviewed

    Description

      After HADOOP-12317, kill command's execution will be failure under Ubuntu12. After NM restarts, it cannot get if a process is alive or not via pid of containers, and it cannot kill process correctly when RM/AM tells NM to kill a container.

      Logs from NM (customized logs):

      2015-09-25 21:58:59,348 INFO  nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:containerIsAlive(431)) -  ================== check alive cmd:[[Ljava.lang.String;@496e442d]
      2015-09-25 21:58:59,349 INFO  nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) - USER=hrt_qa       IP=10.0.1.14    OPERATION=Stop Container Request        TARGET=ContainerManageImpl      RESULT=SUCCESS  APPID=application_1443218269460_0001    CONTAINERID=container_1443218269460_0001_01_000001
      2015-09-25 21:58:59,363 INFO  nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:containerIsAlive(438)) -  ===========================
      ExitCodeException exitCode=1: ERROR: garbage process ID "--".
      Usage:
        kill pid ...              Send SIGTERM to every process listed.
        kill signal pid ...       Send a signal to every process listed.
        kill -s signal pid ...    Send a signal to every process listed.
        kill -l                   List all signal names.
        kill -L                   List all signal names in a nice table.
        kill -l signal            Convert between signal numbers and names.
      
              at org.apache.hadoop.util.Shell.runCommand(Shell.java:550)
              at org.apache.hadoop.util.Shell.run(Shell.java:461)
              at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:727)
              at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.containerIsAlive(DefaultContainerExecutor.java:432)
              at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.signalContainer(DefaultContainerExecutor.java:401)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:419)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
              at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
              at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        1. HADOOP-12441.1.patch
          7 kB
          Wangda Tan
        2. HADOOP-12441.2.patch
          9 kB
          Wangda Tan
        3. HADOOP-12441.4.patch
          6 kB
          Wangda Tan

        Issue Links

          Activity

            People

              leftnoteasy Wangda Tan
              leftnoteasy Wangda Tan
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: