Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30365

When deploy mode is a client, why doesn't it support remote "spark.files" download?

    XMLWordPrintableJSON

    Details

    • Type: Question
    • Status: Resolved
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: 2.3.2
    • Fix Version/s: None
    • Component/s: Spark Submit
    • Labels:
      None
    • Environment:
       ./bin/spark-submit \
      --master yarn  \
      --deploy-mode client \
      ......

      Description

      // In client mode, download remote files.
      var localPrimaryResource: String = null
      var localJars: String = null
      var localPyFiles: String = null
      if (deployMode == CLIENT) {
        localPrimaryResource = Option(args.primaryResource).map {
          downloadFile(_, targetDir, sparkConf, hadoopConf, secMgr)
        }.orNull
        localJars = Option(args.jars).map {
          downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
        }.orNull
        localPyFiles = Option(args.pyFiles).map {
          downloadFileList(_, targetDir, sparkConf, hadoopConf, secMgr)
        }.orNull
      }
      

      The above Spark2.3 SparkSubmit code does not download the corresponding file of "spark.files".

      I think it is possible to download remote files locally and add them to classPath.

      For example, can support --files configuration remote hive-site.xml

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              wangzhun wangzhun
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: