Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Invalid
-
2.1.1
-
None
-
None
Description
A `RDD.foreach` or `RDD.foreachPartition` call will swallow Exceptions thrown inside its closure, but not if the exception was thrown earlier in the call chain. An example:
package examples import org.apache.spark._ object Shpark { def main(args: Array[String]) { implicit val sc: SparkContext = new SparkContext( new SparkConf().setMaster("local[*]").setAppName("blahfoobar") ) /* DOESN'T THROW sc.parallelize(0 until 10000000) .foreachPartition { _.map { i => println("BEFORE THROW") throw new Exception("Testing exception handling") println(i) }} */ /* DOESN'T THROW, nor does anything print. * Commenting out the exception runs the prints. * (i.e. `foreach` is sufficient to "run" an RDD) sc.parallelize(0 until 100000) .foreach({ i => println("BEFORE THROW") throw new Exception("Testing exception handling") println(i) }) */ /* Throws! */ sc.parallelize(0 until 100000) .map({ i => println("BEFORE THROW") throw new Exception("Testing exception handling") i }) .foreach(i => println(i)) println("JOB DONE!") System.in.read sc.stop() } }
When exceptions are swallowed, the jobs don't seem to fail, and the driver exits normally. When one is thrown, as in the last example, the exception successfully rises up to the driver and can be caught with try/catch.
The expected behaviour is for exceptions in `foreach` to throw and crash the driver, as they would with `map`.