Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.5.0, 0.5.1, 0.5.2, 0.6.0, 0.6.1
-
None
Description
Because the JVM uses fork/exec to launch child processes, any child process initially has the memory footprint of its parent. In the case of a large Spark JVM that spawns many child processes (for Pipe or Python support), this quickly leads to kernel memory exhaustion.
This problem is discussed here:
https://gist.github.com/1970815
It results in errors like this:
13/01/31 20:18:48 INFO cluster.TaskSetManager: Loss was due to java.io.IOException: Cannot run program "cat": java.io.IOException: error=12, Cannot allocate memory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:475)
at spark.rdd.PipedRDD.compute(PipedRDD.scala:38)
at spark.RDD.computeOrReadCheckpoint(RDD.scala:203)
at spark.RDD.iterator(RDD.scala:192)
at spark.scheduler.ResultTask.run(ResultTask.scala:76)
I was able to workaround by allowing for memory over-commitment by the kernel on all slaves,
echo 1 > /proc/sys/vm/overcommit_memory
but we should try to include a more robust solution, such as the one here:
https://github.com/axiak/java_posix_spawn