[SPARK-5654] Integrate SparkR into Apache Spark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.0
Component/s: SparkR
Labels:
None

Target Version/s:

1.4.0

Description

The SparkR project [1] provides a light-weight frontend to launch Spark jobs from R. The project was started at the AMPLab around a year ago and has been incubated as its own project to make sure it can be easily merged into upstream Spark, i.e. not introduce any external dependencies etc. SparkR’s goals are similar to PySpark and shares a similar design pattern as described in our meetup talk[2], Spark Summit presentation[3].

Integrating SparkR into the Apache project will enable R users to use Spark out of the box and given R’s large user base, it will help the Spark project reach more users. Additionally, work in progress features like providing R integration with ML Pipelines and Dataframes can be better achieved by development in a unified code base.

SparkR is available under the Apache 2.0 License and does not have any external dependencies other than requiring users to have R and Java installed on their machines. SparkR’s developers come from many organizations including UC Berkeley, Alteryx, Intel and we will support future development, maintenance after the integration.

[1] https://github.com/amplab-extras/SparkR-pkg
[2] http://files.meetup.com/3138542/SparkR-meetup.pdf
[3] http://spark-summit.org/2014/talk/sparkr-interactive-r-programs-at-scale-2

Attachments

Issue Links

links to

[Github] Pull Request #5077 (davies)

[Github] Pull Request #5096 (shivaram)

Activity

People

Assignee:: Shivaram Venkataraman

Reporter:: Shivaram Venkataraman

Votes:: 4 Vote for this issue

Watchers:: 23 Start watching this issue

Dates

Created:: 06/Feb/15 18:26

Updated:: 09/Apr/15 05:46

Resolved:: 09/Apr/15 05:46