Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13573

Open SparkR APIs (R package) to allow better 3rd party usage

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • SparkR

    Description

      Currently, SparkR's R package does not expose enough of its APIs to be used flexibly. That I am aware of, SparkR still requires you to create a new SparkContext by invoking the sparkR.init method (so you cannot connect to a running one) and there is no way to invoke custom Java methods using the exposed SparkR API (unlike PySpark).

      We currently maintain a fork of SparkR that is used to power the R implementation of Apache Toree, which is a gateway to use Apache Spark. This fork provides a connect method (to use an existing Spark Context), exposes needed methods like invokeJava (to be able to communicate with our JVM to retrieve code to run, etc), and uses reflection to access org.apache.spark.api.r.RBackend.

      Here is the documentation I recorded regarding changes we need to enable SparkR as an option for Apache Toree: https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              senkwich Chip Senkbeil
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: