Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5479

PySpark on yarn mode need to support non-local python files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.4.0
    • 1.5.0
    • PySpark, YARN
    • None

    Description

      In SPARK-5162 vgrigor reports this:
      Now following code cannot work:
      aws emr add-steps --cluster-id "j-XYWIXMD234" \
      --steps Name=SparkPi,Jar=s3://eu-west-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[/home/hadoop/spark/bin/spark-submit,--deploy-mode,cluster,--master,yarn-cluster,--py-files,s3://mybucketat.amazonaws.com/tasks/main.py,main.py,param1],ActionOnFailure=CONTINUE

      so we need to support non-local python files on yarn client and cluster mode.
      before submitting application to Yarn, we need to download non-local files to local or hdfs path.
      or spark.yarn.dist.files need to support other non-local files.

      Attachments

        Issue Links

          Activity

            People

              vanzin Marcelo Masiero Vanzin
              lianhuiwang Lianhui Wang
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: