Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
1.4.0
-
None
-
None
Description
The SparkR package library directory path (RLibDir) is needed for SparkR applications for loading SparkR package and locating R helper files inside the package.
Currently, there are some places that the RLibDir needs to be specified.
First of all, when you programs a SparkR application, sparkR.init() allows you to pass a RLibDir parameter (by default, it is the same as the SparkR package's libname on the driver host). However, it seems not reasonable to hard-code RLibDir in a program. Instead, it would be more flexible to pass RLibDir via command line or env variable.
Additionally, for YARN cluster mode, RRunner depends on SPARK_HOME env variable to get the RLibDir (assume $SPARK_HOME/R/lib).
So it would be better to define a consistent way to pass RLibDir to a SparkR application in all deployment modes. It could be a command line option for bin/sparkR or an env variable. It can be passed to a sparkR application, and we can remove the RLibDir parameter of sparkR.init(). When in YARN cluster mode, it can be passed to AM using spark.yarn.appMasterEnv.[EnvironmentVariableName] configuration option.
Attachments
Issue Links
- is related to
-
SPARK-6797 Add support for YARN cluster mode
-
- Resolved
-