Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8041

Consistently pass SparkR library directory to SparkR application

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 1.4.0
    • None
    • SparkR
    • None

    Description

      The SparkR package library directory path (RLibDir) is needed for SparkR applications for loading SparkR package and locating R helper files inside the package.

      Currently, there are some places that the RLibDir needs to be specified.

      First of all, when you programs a SparkR application, sparkR.init() allows you to pass a RLibDir parameter (by default, it is the same as the SparkR package's libname on the driver host). However, it seems not reasonable to hard-code RLibDir in a program. Instead, it would be more flexible to pass RLibDir via command line or env variable.

      Additionally, for YARN cluster mode, RRunner depends on SPARK_HOME env variable to get the RLibDir (assume $SPARK_HOME/R/lib).

      So it would be better to define a consistent way to pass RLibDir to a SparkR application in all deployment modes. It could be a command line option for bin/sparkR or an env variable. It can be passed to a sparkR application, and we can remove the RLibDir parameter of sparkR.init(). When in YARN cluster mode, it can be passed to AM using spark.yarn.appMasterEnv.[EnvironmentVariableName] configuration option.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              sunrui Sun Rui
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: