Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.3.1
-
None
Description
A common use case we want to support with Kubernetes is the usage of custom Docker images. Some examples include:
- A user builds an application using Gradle or Maven, using Spark as a compile-time dependency. The application's jars (both the custom-written jars and the dependencies) need to be packaged in a docker image that can be run via spark-submit.
- A user builds a PySpark or R application and desires to include custom dependencies
- A user wants to switch the base image from Alpine to CentOS while using either built-in or custom jars
We currently do not document how these custom Docker images are supposed to be built, nor do we guarantee stability of these Docker images with various spark-submit versions. To illustrate how this can break down, suppose for example we decide to change the names of environment variables that denote the driver/executor extra JVM options specified by spark.[driver|executor].extraJavaOptions. If we change the environment variable spark-submit provides then the user must update their custom Dockerfile and build new images.
Rather than jumping to an implementation immediately though, it's worth taking a step back and considering these matters from the perspective of the end user. Towards that end, this ticket will serve as a forum where we can answer at least the following questions, and any others pertaining to the matter:
- What would be the steps a user would need to take to build a custom Docker image, given their desire to customize the dependencies and the content (OS or otherwise) of said images?
- How can we ensure the user does not need to rebuild the image if only the spark-submit version changes?
The end deliverable for this ticket is a design document, and then we'll create sub-issues for the technical implementation and documentation of the contract.
Attachments
Issue Links
- is duplicated by
-
SPARK-23891 Debian based Dockerfile
- Resolved
-
SPARK-26597 Support using images with different entrypoints on Kubernetes
- Resolved
-
SPARK-26398 Support building GPU docker images
- Resolved
-
SPARK-26773 Consider alternative base images for Kubernetes
- Resolved
-
SPARK-29487 Ability to run Spark Kubernetes other than from /opt/spark
- Resolved