Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-8472 YARN Container Phase 2
  3. YARN-9660

Enhance documentation of Docker on YARN support

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.3.0
    • documentation, nodemanager
    • None
    • Reviewed

    Description

      Right now, using Docker on YARN has some hard requirements. If these requirements are not met, then launching the containers will fail and and error message will be printed. Depending on how familiar the user is with Docker, it might or might not be easy for them to understand what went wrong and how to fix the underlying problem.

      It would be important to explicitly document these requirements along with the error messages.

      #1: CGroups handler cannot be systemd

      If docker deamon runs with systemd cgroups handler, we receive the following error upon launching a container:

      Container id: container_1561638268473_0006_01_000002
      Exit code: 7
      Exception message: Launch container failed
      Shell error output: /usr/bin/docker-current: Error response from daemon: cgroup-parent for systemd cgroup should be a valid slice named as "xxx.slice".
      See '/usr/bin/docker-current run --help'.
      Shell output: main : command provided 4
      main : run as user is johndoe
      main : requested yarn user is johndoe
      

      Solution: switch to cgroupfs. Doing so can be OS-specific, but we can document a systemcl example.

       

      #2: /bin/bash must be present on the $PATH inside the container

      Some smaller images like "busybox" or "alpine" does not have /bin/bash. It's because all commands under /bin are linked to /bin/busybox and there's only /bin/sh.

      If we try to use these kind of images, we'll see the following error message:

      Container id: container_1561638268473_0015_01_000002
      Exit code: 7
      Exception message: Launch container failed
      Shell error output: /usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:235: starting container process caused "exec: \"bash\": executable file not found in $PATH".
      Shell output: main : command provided 4
      main : run as user is johndoe
      main : requested yarn user is johndoe
      

       

      #3: find command must be available on the $PATH

      It seems obvious that we have the find command, but even very popular images like fedora requires that we install it separately.

      If we don't have find available, then launcher_container.sh fails with:

      [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error file: prelaunch.err.
      Last 4096 bytes of prelaunch.err :
      /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_000002/launch_container.sh: line 44: find: command not found
      Last 4096 bytes of stderr.txt :
      [2019-07-01 03:51:25.053]Container exited with a non-zero exit code 127. Error file: prelaunch.err.
      Last 4096 bytes of prelaunch.err :
      /tmp/hadoop-systest/nm-local-dir/usercache/systest/appcache/application_1561638268473_0017/container_1561638268473_0017_01_000002/launch_container.sh: line 44: find: command not found
      Last 4096 bytes of stderr.txt :
      

      #4 Add cmd-line example of how to tag local images

      This is actually documented under "Privileged Container Security Consideration", but an one-liner would be helpful. I had trouble running a local docker image and tagging it appropriately. Just an example like docker tag local_ubuntu local/ubuntu:latest is already very informative.

      Attachments

        1. YARN-9660-002.patch
          5 kB
          Peter Bacsko
        2. YARN-9660-001.patch
          5 kB
          Peter Bacsko

        Activity

          People

            pbacsko Peter Bacsko
            pbacsko Peter Bacsko
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: