Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30496

Pyspark on kubernetes does not support --py-files from remote storage in cluster mode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.4
    • 3.0.0
    • Kubernetes, PySpark, Spark Core
    • None

    Description

      The following spark-submit on yarn works fine downloading the file from remote storage and putting it into PYTHONPATH,

      spark-submit --master yarn --deploy-mode cluster --py-files s3://bucket/packages.zip s3://bucket/etl.py

       

      While, the same fails on k8s with import errors for packages in the zip file. The following is set to PYTHONPATH on k8s, which has a link to s3 file which can't be supported by PYTHONPATH

      PYTHONPATH='/opt/spark/python/lib/pyspark.zip:/opt/spark/python/lib/py4j-*.zip:s3://bucket/packages.zip'

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              navdeepniku Navdeep Poonia
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: