Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16055

sparkR.init() can not load sparkPackages when executing an R file

    XMLWordPrintableJSON

Details

    • Brainstorming
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.6.1
    • 2.0.0
    • SparkR
    • None

    Description

      This is an issue reported in the Spark user mailing list. Refer to http://comments.gmane.org/gmane.comp.lang.scala.spark.user/35742

      This issue does not occur in an interactive SparkR session, while it does occur when executing an R file.

      The following example code can be put into an R file to reproduce this issue:

      .libPaths(c("/home/user/spark-1.6.1-bin-hadoop2.6/R/lib",.libPaths()))
      Sys.setenv(SPARK_HOME="/home/user/spark-1.6.1-bin-hadoop2.6")
      library("SparkR")
      sc <- sparkR.init(sparkPackages = "com.databricks:spark-csv_2.11:1.4.0")
      sqlContext <- sparkRSQL.init(sc)
      df <- read.df(sqlContext, "file:///home/user/spark-1.6.1-bin-hadoop2.6/data/mllib/sample_tree_data.csv","csv")
      showDF(df)
      

      The error message is as such:

      16/06/19 15:48:56 ERROR RBackendHandler: loadDF on org.apache.spark.sql.api.r.SQLUtils failed
      Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
      java.lang.ClassNotFoundException: Failed to find data source: csv. Please find packages at http://spark-packages.org
      at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:77)
      at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
      at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
      at org.apache.spark.sql.api.r.SQLUtils$.loadDF(SQLUtils.scala:160)
      at org.apache.spark.sql.api.r.SQLUtils.loadDF(SQLUtils.scala)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
      at org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala
      Calls: read.df -> callJStatic -> invokeJava
      Execution halted

      The reason behind this is that in case you execute an R file, the R backend launches before the R interpreter, so there is no opportunity for packages specified with ‘sparkPackages’ to be processed.

      This JIRA issue is to track this issue. An appropriate solution is to be discussed. Maybe documentation the limitation.

      Attachments

        Activity

          People

            KrishnaKalyan3 Krishna Kalyan
            sunrui Sun Rui
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: