Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-3080

The DockerContainerExecutor could not write the right pid to container pidFile

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.6.0
    • Fix Version/s: None
    • Component/s: nodemanager
    • Labels:

      Description

      The docker_container_executor_session.sh is like this:

      #!/usr/bin/env bash

      echo `/usr/bin/docker inspect --format .State.Pid container_1421723685222_0008_01_000002` > /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_000002/container_1421723685222_0008_01_000002.pid.tmp
      /bin/mv -f /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_000002/container_1421723685222_0008_01_000002.pid.tmp /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_000002/container_1421723685222_0008_01_000002.pid
      /usr/bin/docker run --rm --name container_1421723685222_0008_01_000002 -e GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e GAIA_CONTAINER_ID=container_1421723685222_0008_01_000002 --memory=32M --cpu-shares=1024 -v /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_000002:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_000002 -v /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_000002:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_000002 -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash "/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_000002/launch_container.sh"

      The DockerContainerExecutor use docker inspect before docker run, so the docker inspect couldn't get the right pid for the docker, signalContainer() and nm restart would fail.

        Attachments

        1. YARN-3080.patch
          42 kB
          Abin Shahab
        2. YARN-3080.patch
          42 kB
          Abin Shahab
        3. YARN-3080.patch
          42 kB
          Abin Shahab
        4. YARN-3080.patch
          41 kB
          Abin Shahab
        5. YARN-3080.patch
          40 kB
          Abin Shahab

          Issue Links

            Activity

              People

              • Assignee:
                ashahab Abin Shahab
                Reporter:
                beckham007 Beckham007
              • Votes:
                0 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated: