Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13587

Support virtualenv in PySpark

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 1.6.3, 2.0.2, 2.1.2, 2.2.1, 2.3.0
    • None
    • PySpark
    • None

    Description

      Currently, it's not easy for user to add third party python packages in pyspark.

      • One way is to using --py-files (suitable for simple dependency, but not suitable for complicated dependency, especially with transitive dependency)
      • Another way is install packages manually on each node (time wasting, and not easy to switch to different environment)

      Python has now 2 different virtualenv implementation. One is native virtualenv another is through conda. This jira is trying to migrate these 2 tools to distributed environment

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              zjffdu Jeff Zhang
              Votes:
              32 Vote for this issue
              Watchers:
              60 Start watching this issue

              Dates

                Created:
                Updated: