[SPARK-16055] sparkR.init() can not load sparkPackages when executing an R file - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Brainstorming
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.6.1
Fix Version/s: 2.0.0
Component/s: SparkR
Labels:
None

Description

This is an issue reported in the Spark user mailing list. Refer to http://comments.gmane.org/gmane.comp.lang.scala.spark.user/35742

This issue does not occur in an interactive SparkR session, while it does occur when executing an R file.

The following example code can be put into an R file to reproduce this issue:

.libPaths(c("/home/user/spark-1.6.1-bin-hadoop2.6/R/lib",.libPaths()))
Sys.setenv(SPARK_HOME="/home/user/spark-1.6.1-bin-hadoop2.6")
library("SparkR")
sc <- sparkR.init(sparkPackages = "com.databricks:spark-csv_2.11:1.4.0")
sqlContext <- sparkRSQL.init(sc)
df <- read.df(sqlContext, "file:///home/user/spark-1.6.1-bin-hadoop2.6/data/mllib/sample_tree_data.csv","csv")
showDF(df)

The error message is as such:

16/06/19 15:48:56 ERROR RBackendHandler: loadDF on org.apache.spark.sql.api.r.SQLUtils failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
java.lang.ClassNotFoundException: Failed to find data source: csv. Please find packages at http://spark-packages.org
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at org.apache.spark.sql.api.r.SQLUtils$.loadDF(SQLUtils.scala:160)
at org.apache.spark.sql.api.r.SQLUtils.loadDF(SQLUtils.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala
Calls: read.df -> callJStatic -> invokeJava
Execution halted

The reason behind this is that in case you execute an R file, the R backend launches before the R interpreter, so there is no opportunity for packages specified with ‘sparkPackages’ to be processed.

This JIRA issue is to track this issue. An appropriate solution is to be discussed. Maybe documentation the limitation.

Attachments

Issue Links

links to

[Github] Pull Request #14179 (krishnakalyan3)

Activity

People

Assignee:: Krishna Kalyan

Reporter:: Sun Rui

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 19/Jun/16 07:48

Updated:: 20/Jul/16 02:31

Resolved:: 18/Jul/16 16:47