Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
None
-
None
Description
Currently, SparkR's R package does not expose enough of its APIs to be used flexibly. That I am aware of, SparkR still requires you to create a new SparkContext by invoking the sparkR.init method (so you cannot connect to a running one) and there is no way to invoke custom Java methods using the exposed SparkR API (unlike PySpark).
We currently maintain a fork of SparkR that is used to power the R implementation of Apache Toree, which is a gateway to use Apache Spark. This fork provides a connect method (to use an existing Spark Context), exposes needed methods like invokeJava (to be able to communicate with our JVM to retrieve code to run, etc), and uses reflection to access org.apache.spark.api.r.RBackend.
Here is the documentation I recorded regarding changes we need to enable SparkR as an option for Apache Toree: https://github.com/apache/incubator-toree/tree/master/sparkr-interpreter/src/main/resources
Attachments
Issue Links
- is related to
-
SPARK-16581 Making JVM backend calling functions public
- Resolved