Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25283

A deadlock in UnionRDD

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0
    • 2.4.0
    • Spark Core
    • None

    Description

      The PR https://github.com/apache/spark/pull/21913 replaced Scala parallel collections in UnionRDD by new parmap function. This changes cause a deadlock in the partitions method. The code demonstrates the problem:

          val wide = 20
          def unionRDD(num: Int): UnionRDD[Int] = {
            val rdds = (0 until num).map(_ => sc.parallelize(1 to 10, 1))
            new UnionRDD(sc, rdds)
          }
          val level0 = (0 until wide).map { _ =>
            val level1 = (0 until wide).map(_ => unionRDD(wide))
            new UnionRDD(sc, level1)
          }
          val rdd = new UnionRDD(sc, level0)
      
          rdd.partitions.length
      

      Attachments

        Issue Links

          Activity

            People

              maxgekk Max Gekk
              maxgekk Max Gekk
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: