Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17428

SparkR executors/workers support virtualenv

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Done
    • None
    • None
    • SparkR
    • None

    Description

      Many users have requirements to use third party R packages in executors/workers, but SparkR can not satisfy this requirements elegantly. For example, you should to mess with the IT/administrators of the cluster to deploy these R packages on each executors/workers node which is very inflexible.

      I think we should support third party R packages for SparkR users as what we do for jar packages in the following two scenarios:
      1, Users can install R packages from CRAN or custom CRAN-like repository for each executors.
      2, Users can load their local R packages and install them on each executors.

      To achieve this goal, the first thing is to make SparkR executors support virtualenv like Python conda. I have investigated and found packrat(http://rstudio.github.io/packrat/) is one of the candidates to support virtualenv for R. Packrat is a dependency management system for R and can isolate the dependent R packages in its own private package space. Then SparkR users can install third party packages in the application scope(destroy after the application exit) and don’t need to bother IT/administrators to install these packages manually.

      I would like to know whether it make sense.

      Attachments

        Issue Links

          Activity

            People

              yanboliang Yanbo Liang
              yanboliang Yanbo Liang
              Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: