Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-1419

PySpark dependencies support

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • pySpark
    • None

    Description

      Is it possible to add support for dependencies description on a notebook?

      Ideally, one would describes its dependencies on top of the notebook, ie, when python developers write in requirements.txt).

      PySpark would automatically handle the installation and deployment of it, inside a virtualenv or a conda.

      This would allow PySpark jobs to be completely independent from each other. If one notebook needs a Python library that does not exist on the cluster the installation will be done automatically without conflicting with already existing packages (it is not recommended to do a sudo pip install of anything), and with all the transitive dependencies automatically downloaded as well from pypi.python.org.
      Also, two different jobs might use the same library but in two different versions.

      I am working on this support for PySpark, with the ticket SPARK-16367 and in Toree for Jupyter, with TOREE-337. Let me know what you think.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gaetan@xeberon.net gsemet
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: