Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
2.1.1
-
None
Description
Running spark submit has to distribute Spark's JARs to a distributed cache in order for the executors to access it.
When neither spark.yarn.jars or spark.yarn.archive is provided, SparkSubmit creates a ZIP of all the JARs in $SPARK_HOME/jars and uploads it to the distributed cache.
After uploading the ZIP file, SparkSubmit does not delete the local copy of it. This, in turn can cause the disk on the local machine to fill up (200MB at a time) until no more submissions are possible.