Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.4.0, 1.4.1
Description
Repro:
sc.textFile("/ThisFileDoesNotExist").cache()
sc.parallelize(0 until 100).toDebugString
Output:
java.io.IOException: Not a file: /ThisFileDoesNotExist at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:215) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.storage.RDDInfo$.fromRdd(RDDInfo.scala:59) at org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:1455) at org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:1455) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.MapLike$DefaultValuesIterable.foreach(MapLike.scala:206) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.SparkContext.getRDDStorageInfo(SparkContext.scala:1455) at org.apache.spark.rdd.RDD.debugSelf$1(RDD.scala:1573) at org.apache.spark.rdd.RDD.firstDebugString$1(RDD.scala:1607) at org.apache.spark.rdd.RDD.toDebugString(RDD.scala:1637
This is because toDebugString gets all the partitions from all RDDs, which fails (via SparkContext#getRDDStorageInfo). This pathway should definitely be resilient to other RDDs being invalid (and getRDDStorageInfo should probably also be).
Attachments
Issue Links
- links to