Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
The Docker-on-Yarn feature is stable for a while now in Hadoop.
One can run Spark on Docker using the Docker-on-Yarn feature by providing runtime environments to the Spark AM and Executor containers similar to this:
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker --conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=repo/image:tag --conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS="/etc/passwd:/etc/passwd:ro,/etc/hadoop:/etc/hadoop:ro" --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=repo/image:tag --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS="/etc/passwd:/etc/passwd:ro,/etc/hadoop:/etc/hadoop:ro"
This is not very user friendly. I suggest to add CLI options to specify:
- whether docker image should be used (--docker)
- which docker image should be used (--docker-image)
- what docker mounts should be used (--docker-mounts)
for the AM and executor containers separately.
Let's discuss!