Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
Description
The implementation of RDD.isEmpty() fails if there is empty partitions. It was introduce by https://github.com/apache/spark/pull/4074
Example:
sc.parallelize(Seq(), 1).isEmpty()
The above code throws an exception like this:
org.apache.spark.SparkDriverExecutionException: Execution error
at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:977)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1374)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1338)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Cause: java.lang.ArrayStoreException: [Ljava.lang.Object;
at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:88)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1466)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1466)
at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:973)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1374)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1338)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)