Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31165

Multiple wrong references in Dockerfile for k8s



    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Not A Problem
    • Affects Version/s: 2.4.5, 3.0.0
    • Fix Version/s: None
    • Component/s: Kubernetes, Spark Core
    • Labels:


      I am currently trying to follow the k8s instructions for Spark: https://spark.apache.org/docs/latest/running-on-kubernetes.html and when I clone apache/spark on GitHub on the master branch I saw multiple wrong folder references after trying to build my Docker image:


      Issue 1: The comments in the Dockerfile reference the wrong folder for the Dockerfile:

      # If this docker file is being used in the context of building your images from a Spark # distribution, the docker build command should be invoked from the top level directory # of the Spark distribution. E.g.: # docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .

      Well that docker build command simply won't run. I only got the following to run:

      docker build -t spark:latest -f resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile . 

      which is the actual path to the Dockerfile.


      Issue 2: jars folder does not exist

      After I read the tutorial I of course build spark first as per the instructions with:

      ./build/mvn -Pkubernetes -DskipTests clean package

      Nonetheless, in the Dockerfile I get this error when building:

      Step 5/18 : COPY jars /opt/spark/jars
      COPY failed: stat /var/lib/docker/tmp/docker-builder402673637/jars: no such file or directory

       for which I may have found a similar issue here: https://stackoverflow.com/questions/52451538/spark-for-kubernetes-test-on-mac

      I am new to Spark but I assume that this jars folder - if the build step would actually make it and I ran the maven build of the master branch successfully with the command I mentioned above - would exist in the root folder of the project. Turns out it's here:



      Issue 3: missing entrypoint.sh and decom.sh due to wrong reference

      While Issue 2 remains unresolved as I can't wrap my head around the missing jars folder (bin and sbin got copied successfully after I made a dummy jars folder) I then got stuck on these 2 steps:

      COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/ COPY kubernetes/dockerfiles/spark/decom.sh /opt/


      Step 8/18 : COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/
      COPY failed: stat /var/lib/docker/tmp/docker-builder638219776/kubernetes/dockerfiles/spark/entrypoint.sh: no such file or directory

      which makes sense since the path should actually be:


      Issue 4: /tests/ has been renamed in /integration-tests/

      **And the location is wrong.

      COPY kubernetes/tests /opt/spark/tests

      has to be changed to:

      COPY resource-managers/kubernetes/integration-tests /opt/spark/tests

      I only created one issue since this seems like somebody cleaned up the repo and forgot to change these. Am I missing something here? If I am, I apologise in advance since I am new to the Spark project. I also saw that some of these references were handled through vars in previous branches: https://github.com/apache/spark/blob/branch-2.4/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile (e.g. 2.4) but that also does not run out of the box.
      I am also really not sure about the affected versions since that was not transparent enough for me on GH - feel free to edit that field  
      Thanks in advance!


          Issue Links



              • Assignee:
                tech4242 Nikolay Dimolarov
              • Votes:
                0 Vote for this issue
                2 Start watching this issue


                • Created: