Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24655

[K8S] Custom Docker Image Expectations and Documentation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.1
    • None
    • Kubernetes, Spark Core

    Description

      A common use case we want to support with Kubernetes is the usage of custom Docker images. Some examples include:

      • A user builds an application using Gradle or Maven, using Spark as a compile-time dependency. The application's jars (both the custom-written jars and the dependencies) need to be packaged in a docker image that can be run via spark-submit.
      • A user builds a PySpark or R application and desires to include custom dependencies
      • A user wants to switch the base image from Alpine to CentOS while using either built-in or custom jars

      We currently do not document how these custom Docker images are supposed to be built, nor do we guarantee stability of these Docker images with various spark-submit versions. To illustrate how this can break down, suppose for example we decide to change the names of environment variables that denote the driver/executor extra JVM options specified by spark.[driver|executor].extraJavaOptions. If we change the environment variable spark-submit provides then the user must update their custom Dockerfile and build new images.

      Rather than jumping to an implementation immediately though, it's worth taking a step back and considering these matters from the perspective of the end user. Towards that end, this ticket will serve as a forum where we can answer at least the following questions, and any others pertaining to the matter:

      1. What would be the steps a user would need to take to build a custom Docker image, given their desire to customize the dependencies and the content (OS or otherwise) of said images?
      2. How can we ensure the user does not need to rebuild the image if only the spark-submit version changes?

      The end deliverable for this ticket is a design document, and then we'll create sub-issues for the technical implementation and documentation of the contract.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mcheah Matt Cheah
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: