Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3080

The DockerContainerExecutor could not write the right pid to container pidFile

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.6.0
    • None
    • nodemanager

    Description

      The docker_container_executor_session.sh is like this:

      #!/usr/bin/env bash

      echo `/usr/bin/docker inspect --format .State.Pid container_1421723685222_0008_01_000002` > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_000002/container_1421723685222_0008_01_000002.pid.tmp
      /bin/mv -f /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_000002/container_1421723685222_0008_01_000002.pid.tmp /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_000002/container_1421723685222_0008_01_000002.pid
      /usr/bin/docker run --rm --name container_1421723685222_0008_01_000002 -e GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e GAIA_CONTAINER_ID=container_1421723685222_0008_01_000002 --memory=32M --cpu-shares=1024 -v /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_000002:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_000002 -v /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_000002:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_000002 -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash "/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_000002/launch_container.sh"

      The DockerContainerExecutor use docker inspect before docker run, so the docker inspect couldn't get the right pid for the docker, signalContainer() and nm restart would fail.

      Attachments

        1. YARN-3080.patch
          40 kB
          Abin Shahab
        2. YARN-3080.patch
          41 kB
          Abin Shahab
        3. YARN-3080.patch
          42 kB
          Abin Shahab
        4. YARN-3080.patch
          42 kB
          Abin Shahab
        5. YARN-3080.patch
          42 kB
          Abin Shahab

        Issue Links

          Activity

            People

              ashahab Abin Shahab
              beckham007 Beckham007
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated: