[SPARK-5744] RDD.isEmpty / take fails for (empty) RDD of Nothing - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.0
Component/s: Spark Core
Labels:
None

Description

The implementation of RDD.isEmpty() fails if there is empty partitions. It was introduce by https://github.com/apache/spark/pull/4074

Example:

sc.parallelize(Seq(), 1).isEmpty()

The above code throws an exception like this:

org.apache.spark.SparkDriverExecutionException: Execution error
    at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:977)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1374)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1338)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Cause: java.lang.ArrayStoreException: [Ljava.lang.Object;
    at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:88)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1466)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1466)
    at org.apache.spark.scheduler.JobWaiter.taskSucceeded(JobWaiter.scala:56)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:973)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1374)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1338)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

Attachments

Issue Links

links to

[Github] Pull Request #4534 (tbertelsen)

[Github] Pull Request #4591 (srowen)

[Github] Pull Request #4698 (srowen)

Activity

People

Assignee:: Sean R. Owen

Reporter:: Tobias Bertelsen

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 11/Feb/15 18:24

Updated:: 20/Feb/15 10:22

Resolved:: 20/Feb/15 10:21