Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13587

Support virtualenv in PySpark

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 1.6.3, 2.0.2, 2.1.2, 2.2.1, 2.3.0
    • None
    • PySpark
    • None

    Description

      Currently, it's not easy for user to add third party python packages in pyspark.

      • One way is to using --py-files (suitable for simple dependency, but not suitable for complicated dependency, especially with transitive dependency)
      • Another way is install packages manually on each node (time wasting, and not easy to switch to different environment)

      Python has now 2 different virtualenv implementation. One is native virtualenv another is through conda. This jira is trying to migrate these 2 tools to distributed environment

      Attachments

        Issue Links

        There are no Sub-Tasks for this issue.

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            zjffdu Jeff Zhang

            Dates

              Created:
              Updated:

              Slack

                Issue deployment