Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Not A Problem
-
1.1.1
-
None
-
None
Description
This issue has already been mentioned in the mailing list here: http://apache-spark-user-list.1001560.n3.nabble.com/set-spark-local-dir-on-driver-program-doesn-t-take-effect-td11040.html
The problem usually has to do with Spark creating too many files during shuffles, filling up the small amount of disk space that most EC2 instances have for root on /mnt2.
The workaround is to set SPARK_LOCAL_DIRS to a larger volume (e.g. to the /mnt/spark volume only, removing /mnt2).
However for these changes to take effect, the daemons need to be restarted with sbin/stop-all -> sbin/start-all.
Even more troubling is the fact that the Web UI-> Environment reports that the spark.local.dir is set to the new path, but Spark still spills to /mnt2 as well.
To my knowledge this is not mentioned anywhere in the documentation or any other mailing list reply except for the one I linked.
I guess possible solutions are to either ensure the change does take effect so that reality agrees with what the Web UI is reporting, or include a section on the documentation of EC2 for this kind of problem.
Attachments
Issue Links
- is related to
-
SPARK-6188 Instance types can be mislabeled when re-starting cluster with default arguments
- Resolved