Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10500

sparkr.zip cannot be created if $SPARK_HOME/R/lib is unwritable

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.5.0
    • 1.6.0
    • SparkR
    • None

    Description

      As of SPARK-6797, sparkr.zip is re-created each time spark-submit is run with an R application, which fails if Spark has been installed into a directory to which the current user doesn't have write permissions. (e.g., on EMR's emr-4.0.0 release, Spark is installed at /usr/lib/spark, which is only writable by root.)

      Would it be possible to skip creating sparkr.zip if it already exists? That would enable sparkr.zip to be pre-created by the root user and then reused each time spark-submit is run, which I believe is similar to how pyspark works.

      Another option would be to make the location configurable, as it's currently hardcoded to $SPARK_HOME/R/lib/sparkr.zip. Allowing it to be configured to something like the user's home directory or a random path in /tmp would get around the permissions issue.

      By the way, why does spark-submit even need to re-create sparkr.zip every time a new R application is launched? This seems unnecessary and inefficient, unless you are actively developing the SparkR libraries and expect the contents of sparkr.zip to change.

      Attachments

        Issue Links

          Activity

            People

              sunrui Sun Rui
              jonathak Jonathan Kelly
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: