Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3604

unbounded recursion in getNumPartitions triggers stack overflow for large UnionRDD

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Not A Problem
    • 1.1.0
    • None
    • Spark Core
    • None
    • linux. Used python, but error is in Scala land.

    Description

      I have a large number of parquet files all with the same schema and attempted to make a UnionRDD out of them.

      When I call getNumPartitions(), I get a stack overflow error

      that looks like this:

      Py4JJavaError: An error occurred while calling o3275.partitions.
      : java.lang.StackOverflowError
      at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:239)
      at scala.collection.TraversableLike$class.map(TraversableLike.scala:243)
      at scala.collection.AbstractTraversable.map(Traversable.scala:105)
      at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:65)
      at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
      at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
      at scala.Option.getOrElse(Option.scala:120)
      at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
      at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:65)
      at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:65)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
      at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
      at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
      at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
      at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
      at scala.collection.AbstractTraversable.map(Traversable.scala:105)
      at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:65)
      at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
      at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)

      Attachments

        Activity

          People

            Unassigned Unassigned
            ericdf Eric Friedman
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: