Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.9.0
-
None
Description
The following spark-shell code leads to an infinite retry of the last stage in Spark 0.9:
val data = sc.parallelize(1 to 100, 2).map(x => {throw new NullPointerException; (x, x)}).reduceByKey(_ + _) data.count() // This first one terminates correctly with just an NPE data.count() // This second one never terminates, it keeps failing over and over
The problem seems to be that when there's an NPE in the map stage, we erroneously add map output locations for it, so the next job on the RDD runs only the reduce stage. Those tasks keep failing but they count as a fetch failure, so it keeps retrying.