Description
As of SPARK-6797, sparkr.zip is re-created each time spark-submit is run with an R application, which fails if Spark has been installed into a directory to which the current user doesn't have write permissions. (e.g., on EMR's emr-4.0.0 release, Spark is installed at /usr/lib/spark, which is only writable by root.)
Would it be possible to skip creating sparkr.zip if it already exists? That would enable sparkr.zip to be pre-created by the root user and then reused each time spark-submit is run, which I believe is similar to how pyspark works.
Another option would be to make the location configurable, as it's currently hardcoded to $SPARK_HOME/R/lib/sparkr.zip. Allowing it to be configured to something like the user's home directory or a random path in /tmp would get around the permissions issue.
By the way, why does spark-submit even need to re-create sparkr.zip every time a new R application is launched? This seems unnecessary and inefficient, unless you are actively developing the SparkR libraries and expect the contents of sparkr.zip to change.
Attachments
Issue Links
- is related to
-
SPARK-8313 Support Spark Packages containing R code with --packages
- Resolved
- relates to
-
SPARK-11524 Support SparkR with Mesos cluster
- Resolved
-
SPARK-11525 Support spark packages containing R source code in Standalone mode
- Resolved
-
SPARK-6797 Add support for YARN cluster mode
- Resolved
-
SPARK-9603 Re-enable complex R package test in SparkSubmitSuite
- Resolved
- links to