[SPARK-17428] SparkR executors/workers support virtualenv - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Component/s: SparkR
Labels:
None

Description

Many users have requirements to use third party R packages in executors/workers, but SparkR can not satisfy this requirements elegantly. For example, you should to mess with the IT/administrators of the cluster to deploy these R packages on each executors/workers node which is very inflexible.

I think we should support third party R packages for SparkR users as what we do for jar packages in the following two scenarios:
1, Users can install R packages from CRAN or custom CRAN-like repository for each executors.
2, Users can load their local R packages and install them on each executors.

To achieve this goal, the first thing is to make SparkR executors support virtualenv like Python conda. I have investigated and found packrat(http://rstudio.github.io/packrat/) is one of the candidates to support virtualenv for R. Packrat is a dependency management system for R and can isolate the dependent R packages in its own private package space. Then SparkR users can install third party packages in the application scope(destroy after the application exit) and don’t need to bother IT/administrators to install these packages manually.

I would like to know whether it make sense.

Attachments

Issue Links

is related to

SPARK-13587 Support virtualenv in PySpark

In Progress

SPARK-16367 Wheelhouse Support for PySpark

Resolved

relates to

SPARK-17577 SparkR support add files to Spark job and get by executors

Resolved

Activity

People

Assignee:: Yanbo Liang

Reporter:: Yanbo Liang

Votes:: 2 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 07/Sep/16 06:30

Updated:: 27/Sep/16 11:40

Resolved:: 27/Sep/16 11:40