Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20741

SparkSubmit does not clean up after uploading spark_libs to the distributed cache

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.1.1
    • 2.2.0
    • Spark Submit
    • None

    Description

      Running spark submit has to distribute Spark's JARs to a distributed cache in order for the executors to access it.
      When neither spark.yarn.jars or spark.yarn.archive is provided, SparkSubmit creates a ZIP of all the JARs in $SPARK_HOME/jars and uploads it to the distributed cache.
      After uploading the ZIP file, SparkSubmit does not delete the local copy of it. This, in turn can cause the disk on the local machine to fill up (200MB at a time) until no more submissions are possible.

      Attachments

        Activity

          People

            lioron Lior Regev
            lioron Lior Regev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: