diff --git hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md new file mode 100644 index 0000000..1175688 --- /dev/null +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DockerContainers.md @@ -0,0 +1,288 @@ + + +Launching Applications Using Docker Containers +============================================== + +* [Overview](#Overview) +* [Cluster Configuration](#Cluster_Configuration) +* [Docker Image Requirements](#Docker_Image_Requirements) +* [Application Submission](#Application_Submission) +* [Connecting to a Secure Docker Repository](#Connecting_to_a_Secure_Docker_Repository) +* [Example: MapReduce](#Example:_MapReduce) +* [Example: Spark](#Example:_Spark) + +Overview +-------- + +[Docker](https://www.docker.io/) combines an easy-to-use interface to Linux +containers with easy-to-construct image files for those containers. In short, +Docker launches very light weight virtual machines. + +The Linux Container Executor (LCE) allows the YARN NodeManager to launch YARN +containers into Docker containers. Users can specify the Docker images they +want for their YARN containers. These containers provide a custom software +environment in which the user's code runs, isolated from the software +environment of the NodeManager and other applications. These containers can +include special libraries needed by the application, and they can have +different versions of native tools and libraries including Perl, Python, and +even Java. Indeed, these containers can run a different flavor of Linux than +what is running on the NodeManager -- although the YARN container must define +all the environments and libraries needed to run the job, nothing will be +shared with the NodeManager. + +Docker for YARN provides both consistency (all YARN containers will have the +same software environment) and isolation (no interference with whatever is +installed on the physical machine). + +The Docker suuport in the LCE is still evolving. To track progress, follow +JIRA-3611, the umbrella JIRA for Docker support improvements. + +Cluster Configuration +--------------------- + +The LCE requires that container-executor binary be owned by root:hadoop and have +6050 permissions. In order to launch Docker containers, the Docker daemon be +running on all NodeManager hosts where Docker containers will be launched. The +Docker client must also be installed on all NodeManager hosts where Docker +containers will be launched and able to start Docker containers. + +To prevent timeouts while starting jobs, any large Docker images to be used by +an application should already be loaded in the Docker daemon's cache on the +NodeManager hosts. A simple way to load an image is my issuing a Docker pull +request. For example: + +``` + sudo docker pull images/hadoop-docker:latest +``` + +The following properties should be set in yarn-site.xml: + +```xml + + yarn.nodemanager.container-executor.class + org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor + + This is the container executor setting that ensures that all applications + are started with the LinuxContainerExecutor. + + + + + yarn.nodemanager.linux-container-executor.group + hadoop + + The POSIX group of the NodeManager. It should match the setting in + "container-executor.cfg". This configuration is required for validating + the secure access of the container-executor binary. + + + + + yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users + false + + Whether all applications should be run as the NodeManager process' owner. + When false, applications are launched instead as the application owner. + + + + + yarn.nodemanager.runtime.linux.docker.allowed-container-networks + false + + Optional. A comma-separated set of networks allowed when launching + containers. Valid values are determined by Docker networks available from + `docker network ls` + + + + + yarn.nodemanager.runtime.linux.docker.privileged-containers.allowed + false + + Optional. Whether applications are allowed to run in privileged containers. + + + + + yarn.nodemanager.runtime.linux.docker.privileged-containers.acl + false + + Optional. A comma-separated list of users who are allowed to request + privileged contains if privileged containers are allowed. + + +``` + +In addition, a container-executer.cfg file must exist and contain settings for +the container executor. The format of the file is the standard Java +properties file format, for example + + `key=value` + +The following options are required: + +|Configuration Name | Description | +|:---- |:---- | +|yarn.nodemanager.linux-container-executor.group|The Unix group of the + NodeManager. It should match the + yarn.nodemanager.linux-container-executor.group in the yarn-site.xml file| + +The following options are optional: + +|Configuration Name | Description | +|:---- |:---- | +|min.user.id|The minimum UID that is allowed to launch applications. The default +is no minimum| +|banned.users|A comma-separated list of usernames who should not be allowed to +launch applications. The default setting is: yarn, mapred, hdfs, and bin.| +|allowed.system.users|A comma-separated list of usernames who should be allowed +to launch applications even if their UIDs are below the configured minimum. If a +user appears in allowed.system.users and banned.users, the user will be +considered banned.| +|docker.binary|The path to the Docker binary. The default is "docker".| +|feature.docker.enabled|Must be 0 or 1. 0 means launching Docker containers is +disabled. 1 means launching Docker containers is allowed.| +|feature.tc.enabled|Must be 0 or 1. 0 means traffic control commands are +disabled. 1 means traffic control commands are allowed.| + +Docker Image Requirements +------------------------- + +In order to work with YARN, there are two requirements for Docker images. + +First, the Docker container will be explicitly launched with the application +owner as the container user. If the application owner is not a valid user +(by UID) in the Docker image, the application will fail. + +Second, the Docker image must have whatever is expected by the application +in order to execute. In the case of Hadoop (MapReduce or Spark), the Docker +image must contain the JRE and Hadoop libraries and have the necessary +environment variables set: JAVA_HOME, HADOOP_COMMON_PATH, HADOOP_HDFS_HOME, +HADOOP_MAPRED_HOME, HADOOP_YARN_HOME, and HADOOP_CONF_DIR. Note that the +Java and Hadoop versions must be compatible with what's installed on the +cluster. + +If an application requests a Docker image that has not already been loaded by +the Docker daemon on the host where it is to execute, the Docker daemon will +implicitly perform a Docker pull command. Both MapReduce and Spark assume that +tasks which take more that 10 minutes to report progress have stalled, so +specifying a large Docker image may cause the application to fail. + +Application Submission +---------------------- + +Before attempting to launch a Docker container, make sure that the LCE +configuration is working for non-Docker applications. If after enabling the LCE +the NodeManagers fail to start, the cause is most likely that the ownership +and/or permissions on the container-executer binary are incorrect. Check the +logs to confirm. + +In order to run an application in a Docker container, set the following +environment variables in the application's environment: + +|Environment Variable Name | Description | +|:---- |:---- | +|YARN_CONTAINER_RUNTIME_TYPE|Determines whether an application will be launched +in a Docker container. If the value is "docker", the application will be +launched in a Docker container. Otherwise a regular process tree container will +be used.| +|YARN_CONTAINER_RUNTIME_DOCKER_IMAGE|Names which image will be used to launch +the Docker container. Any image name that could be passed to the Docker client's +run command may be used. The image name may include a repo prefix.| +|YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE}|Controls whether the Docker +container's default command is overridden. When set to true, the Docker +container's command will be "bash ". When unset or set to +false, the Docker container's default command is used.| +|YARN_CONTAINER_RUNTIME_DOCKER_CONTAINER_NETWORK|Sets the network type to be +used by the Docker container. It must be a valid value as determined by the +yarn.nodemanager.runtime.linux.docker.allowed-container-networks property.| +|YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER|Controls whether the +Docker container is a privileged container. In order to use privileged +containers, the +yarn.nodemanager.runtime.linux.docker.privileged-containers.allowed +property must be set to true, and the application owner must appear in the value +of the yarn.nodemanager.runtime.linux.docker.privileged-containers.acl +property. If this environment variable is set to true, a privileged Docker +container will be used if allowed. No other value is allowed, so the environment +variable should be left unset rather than setting it to false.| +|YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS|Adds additional volume +mounts to the Docker container. The value of the environment variable should be +a comma-separated list of mounts. All such mounts must be given as +"source:dest", where the source is an absolute path that is not a symlink and +that points to a localized resource. Note that as of YARN-5298, localized +directories are automatically mounted into the container as volumes.| + +Once an application has been submitted with the correct settings to be launched +in a Docker container, the application will behave exactly as any other YARN +application. Logs will be aggregated and stored in the relevant history +server. The application life cycle will be the same as for a non-Docker +application. + +Connecting to a Secure Docker Repository +---------------------------------------- + +Until YARN-5428 is complete, the Docker client command will draw its +configuration from the default location, which is $HOME/.docker/config.json on +the NodeManager host. The Docker configuration is where secure repository +credentials are stored, so use of the LCE with secure Docker repos is +discouraged until YARN-5428 is complete. + +As a work-around, you may manually log the Docker daemon on every NodeManager +host into the secure repo using the Docker login command: + +``` + docker login [OPTIONS] [SERVER] + + Register or log in to a Docker registry server, if no server is specified + "https://index.docker.io/v1/" is the default. + + -e, --email="" Email + -p, --password="" Password + -u, --username="" Username +``` + +Note that this approach means that all users will have access to the secure +repo. + +Example: MapReduce +------------------ + +To submit the pi job to run in Docker containers, run the following commands: + +``` + vars="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-docker" + hadoop jar hadoop-examples.jar -Dyarn.app.mapreduce.am.env=$vars \ + -Dmapreduce.map.env=$vars -Dmapreduce.reduce.env=$vars pi 10 100 +``` + +Note that the application master, map tasks, and reduce tasks are configured +independently. In this example, we are using the hadoop-docker image for all +three. + +Example: Spark +-------------- + +To run a Spark shell in Docker containers, run the following command: + +``` + spark-shell --master yarn --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \ + --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-docker \ + --conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=hadoop-docker \ + --conf spark.yarn.AppMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker +``` + +Note that the application master and executors are configured +independently. In this example, we are using the hadoop-docker image for both.