Details

    • Type: New Feature
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: PySpark
    • Labels:
      None

      Description

      Currently, it's not easy for user to add third party python packages in pyspark.

      • One way is to using --py-files (suitable for simple dependency, but not suitable for complicated dependency, especially with transitive dependency)
      • Another way is install packages manually on each node (time wasting, and not easy to switch to different environment)

      Python has now 2 different virtualenv implementation. One is native virtualenv another is through conda. This jira is trying to migrate these 2 tools to distributed environment

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                zjffdu Jeff Zhang
              • Votes:
                22 Vote for this issue
                Watchers:
                41 Start watching this issue

                Dates

                • Created:
                  Updated: