Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.3.0, 1.3.1
-
None
Description
I encountered a bug where Spark crashes with the following stack trace:
java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:313) at scala.None$.get(Option.scala:311) at org.apache.spark.rdd.PartitionerAwareUnionRDD.getPartitions(PartitionerAwareUnionRDD.scala:69)
Here's a minimal example that reproduces it on the Spark shell:
val x = sc.parallelize(Seq(1->true,2->true,3->false)).partitionBy(new HashPartitioner(1)) val y = sc.parallelize(Seq(1->true)) sc.union(y, x).count() // crashes sc.union(x, y).count() // This works since the first RDD has a partitioner
We had to resort to instantiating the UnionRDD directly to avoid the PartitionerAwareUnionRDD.