Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24384

spark-submit --py-files with .py files doesn't work in client mode before context initialization

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0, 2.4.0
    • 2.3.1, 2.4.0
    • PySpark, Spark Submit
    • None

    Description

      In case the given Python file is .py file (zip file seems fine), seems the python path is dynamically added after the context is got initialized.

      with this pyFile:

      $ cat /home/spark/tmp.py
      def testtest():
          return 1
      

      This works:

      $ cat app.py
      import pyspark
      pyspark.sql.SparkSession.builder.getOrCreate()
      import tmp
      print("************************%s" % tmp.testtest())
      
      $ ./bin/spark-submit --master yarn --deploy-mode client --py-files /home/spark/tmp.py app.py
      ...
      ************************1
      

      but this doesn't:

      $ cat app.py
      import pyspark
      import tmp
      pyspark.sql.SparkSession.builder.getOrCreate()
      print("************************%s" % tmp.testtest())
      
      $ ./bin/spark-submit --master yarn --deploy-mode client --py-files /home/spark/tmp.py app.py
      Traceback (most recent call last):
        File "/home/spark/spark/app.py", line 2, in <module>
          import tmp
      ImportError: No module named tmp
      

      See https://issues.apache.org/jira/browse/SPARK-21945?focusedCommentId=16488486&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16488486

      Attachments

        Activity

          People

            hyukjin.kwon Hyukjin Kwon
            hyukjin.kwon Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: