Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42170

Files added to the spark-submit command with master K8s and deploy mode cluster, end up in a non deterministic location inside the driver.

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.0, 3.2.2
    • None
    • Kubernetes, Spark Submit
    • None

    Description

      Files added to the spark-submit command with master K8s and deploy mode cluster, end up in a non deterministic location inside the driver.

      eg:

      spark-submit --files myfile --master k8s.. --deploy-mode cluster` will upload the files to /tmp/spark-uuid/myfile

      The issue happens because Utils.createTempDir() creates a directory with a uuid in the directory name. This issue does not affect the --archives option, because we `unarchive` the archives into the destination directory which is relative to the working dir. This bug affects file access pre & post app creation. For example if we distribute python dependencies with pex, we need to use --files to attach the pex file and change the spark.pyspark.python to point to this file. But the file location can not be determined before submitting the app. On the other hand, after the app is created, referencing the files without using `SparkFiles.get` also does not work

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            santosh.pingale Santosh Pingale

            Dates

              Created:
              Updated:

              Slack

                Issue deployment