Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8231

Dshell application fails when one of the docker container gets killed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Invalid
    • None
    • None
    • None

    Description

      1) Launch dshell application

      yarn  jar hadoop-yarn-applications-distributedshell-*.jar  -shell_command "sleep 300" -num_containers 2 -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos/httpd-24-centos7:latest -keep_containers_across_application_attempts -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell-*.jar

      2) Kill container_1524681858728_0012_01_000002

      Expected behavior:
      Application should start new instance and finish successfully

      Actual behavior:
      Application Failed as soon as container was killed

      AM log
      18/04/27 23:05:12 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, completedCnt=1
      18/04/27 23:05:12 INFO distributedshell.ApplicationMaster: appattempt_1524681858728_0012_000001 got container status for containerID=container_1524681858728_0012_01_000002, state=COMPLETE, exitStatus=137, diagnostics=[2018-04-27 23:05:09.310]Container killed on request. Exit code is 137
      [2018-04-27 23:05:09.331]Container exited with a non-zero exit code 137. 
      [2018-04-27 23:05:09.332]Killed by external signal
      
      18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, completedCnt=1
      18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: appattempt_1524681858728_0012_000001 got container status for containerID=container_1524681858728_0012_01_000003, state=COMPLETE, exitStatus=0, diagnostics=
      18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Container completed successfully., containerId=container_1524681858728_0012_01_000003
      18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Application completed. Stopping running containers
      18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Application completed. Signalling finish to RM
      18/04/27 23:08:46 INFO distributedshell.ApplicationMaster: Diagnostics., total=2, completed=2, allocated=2, failed=1
      18/04/27 23:08:46 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.

      Attachments

        Activity

          People

            Unassigned Unassigned
            yeshavora Yesha Vora
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: