Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33782

Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.0
    • None
    • Kubernetes
    • None

    Description

      In Yarn cluster modes, the passed files are able to be accessed in the current working directory. Looks like this is not the case in Kubernates cluset mode.

      By doing this, users can, for example, leverage PEX to manage Python dependences in Apache Spark:

      pex pyspark==3.0.1 pyarrow==0.15.1 pandas==0.25.3 -o myarchive.pex
      PYSPARK_PYTHON=./myarchive.pex spark-submit --files myarchive.pex
      

      See also https://github.com/apache/spark/pull/30735/files#r540935585.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              hyukjin.kwon Hyukjin Kwon
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: