Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-12441

Fix kill command behavior under some Linux distributions.

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.8.0, 3.0.0-alpha1
    • None
    • None
    • Reviewed

    Description

      After HADOOP-12317, kill command's execution will be failure under Ubuntu12. After NM restarts, it cannot get if a process is alive or not via pid of containers, and it cannot kill process correctly when RM/AM tells NM to kill a container.

      Logs from NM (customized logs):

      2015-09-25 21:58:59,348 INFO  nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:containerIsAlive(431)) -  ================== check alive cmd:[[Ljava.lang.String;@496e442d]
      2015-09-25 21:58:59,349 INFO  nodemanager.NMAuditLogger (NMAuditLogger.java:logSuccess(89)) - USER=hrt_qa       IP=10.0.1.14    OPERATION=Stop Container Request        TARGET=ContainerManageImpl      RESULT=SUCCESS  APPID=application_1443218269460_0001    CONTAINERID=container_1443218269460_0001_01_000001
      2015-09-25 21:58:59,363 INFO  nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:containerIsAlive(438)) -  ===========================
      ExitCodeException exitCode=1: ERROR: garbage process ID "--".
      Usage:
        kill pid ...              Send SIGTERM to every process listed.
        kill signal pid ...       Send a signal to every process listed.
        kill -s signal pid ...    Send a signal to every process listed.
        kill -l                   List all signal names.
        kill -L                   List all signal names in a nice table.
        kill -l signal            Convert between signal numbers and names.
      
              at org.apache.hadoop.util.Shell.runCommand(Shell.java:550)
              at org.apache.hadoop.util.Shell.run(Shell.java:461)
              at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:727)
              at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.containerIsAlive(DefaultContainerExecutor.java:432)
              at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.signalContainer(DefaultContainerExecutor.java:401)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.cleanupContainer(ContainerLaunch.java:419)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:139)
              at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher.handle(ContainersLauncher.java:55)
              at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
              at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
              at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            leftnoteasy Wangda Tan Assign to me
            leftnoteasy Wangda Tan
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment