Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-832

PySpark should set worker PYTHONPATH from SPARK_HOME instead of inheriting it from the master

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.7.0, 0.7.1, 0.7.2, 0.7.3
    • 0.8.0
    • PySpark
    • None

    Description

      In current versions of PySpark, the worker Python processes inherit the master's PYTHONPATH environment variable. This can lead to ImportErrors when running the PySpark worker processes if the master and workers use different SPARK_HOME paths. Instead, the workers should append SPARK_HOME/python/pyspark to their own PYTHONPATHs.

      To support customization of the PYTHONPATH on the workers (e.g. to add a NFS folder containing shared libraries), users would still be able to set a custom PYTHONPATH in spark-env.sh.

      Attachments

        Activity

          People

            joshrosen Josh Rosen
            joshrosen Josh Rosen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: