Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5654

Integrate SparkR into Apache Spark

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0
    • SparkR
    • None

    Description

      The SparkR project [1] provides a light-weight frontend to launch Spark jobs from R. The project was started at the AMPLab around a year ago and has been incubated as its own project to make sure it can be easily merged into upstream Spark, i.e. not introduce any external dependencies etc. SparkR’s goals are similar to PySpark and shares a similar design pattern as described in our meetup talk[2], Spark Summit presentation[3].

      Integrating SparkR into the Apache project will enable R users to use Spark out of the box and given R’s large user base, it will help the Spark project reach more users. Additionally, work in progress features like providing R integration with ML Pipelines and Dataframes can be better achieved by development in a unified code base.

      SparkR is available under the Apache 2.0 License and does not have any external dependencies other than requiring users to have R and Java installed on their machines. SparkR’s developers come from many organizations including UC Berkeley, Alteryx, Intel and we will support future development, maintenance after the integration.

      [1] https://github.com/amplab-extras/SparkR-pkg
      [2] http://files.meetup.com/3138542/SparkR-meetup.pdf
      [3] http://spark-summit.org/2014/talk/sparkr-interactive-r-programs-at-scale-2

      Attachments

        Activity

          People

            shivaram Shivaram Venkataraman
            shivaram Shivaram Venkataraman
            Votes:
            4 Vote for this issue
            Watchers:
            23 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: