This patch is to address issues when docker container is being used:
1. GPU driver and nvidia libraries: If GPU drivers and NV libraries are pre-packaged inside docker image, it could conflict to driver and nvidia-libraries installed on Host OS. An alternative solution is to detect Host OS's installed drivers and devices, mount it when launch docker container. Please refer to  for more details.
2. Image detection:
From , the challenge is:
Mounting user-level driver libraries and device files clobbers the environment of the container, it should be done only when the container is running a GPU application. The challenge here is to determine if a given image will be using the GPU or not. We should also prevent launching containers based on a Docker image that is incompatible with the host NVIDIA driver version, you can find more details on this wiki page.
3. GPU isolation.
a. Use nvidia-docker-plugin  to address issue #1, this is the same solution used by K8S . issue #2 could be addressed in a separate JIRA.
We won't ship nvidia-docker-plugin with out releases and we require cluster admin to preinstall nvidia-docker-plugin to use GPU+docker support on YARN. "nvidia-docker" is a wrapper of docker binary which can address #3 as well, however "nvidia-docker" doesn't provide same semantics of docker, and it needs to setup additional environments such as PATH/LD_LIBRARY_PATH to use it. To avoid introducing additional issues, we plan to use nvidia-docker-plugin + docker binary approach.
b. To address GPU driver and nvidia libraries, we uses nvidia-docker-plugin  to create a volume which includes GPU-related libraries and mount it when docker container being launched. Changes include:
- Instead of using volume-driver, this patch added docker volume create command to c-e and NM Java side. The reason is volume-driver can only use single volume driver for each launched docker container.
- Updated c-e and Java side, if a mounted volume is a named volume in docker, skip checking file existence. (Named-volume still need to be added to permitted list of container-executor.cfg).
c. To address isolation issue:
We found that, cgroup + docker doesn't work under newer docker version which uses runc as default runtime. Setting --cgroup-parent to a cgroup which include any devices.deny causes docker container cannot be launched.
Instead this patch passes allowed GPU devices via --device to docker launch command.