Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-13747

Concurrent execution in SQL doesn't work with Scala ForkJoinPool

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0, 2.0.1
    • 2.2.0
    • SQL
    • None

    Description

      Run the following codes may fail

      (1 to 100).par.foreach { _ =>
        println(sc.parallelize(1 to 5).map { i => (i, i) }.toDF("a", "b").count())
      }
      
      java.lang.IllegalArgumentException: spark.sql.execution.id is already set 
              at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87) 
              at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) 
              at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) 
      

      This is because SparkContext.runJob can be suspended when using a ForkJoinPool (e.g.,scala.concurrent.ExecutionContext.Implicits.global) as it calls Await.ready (introduced by https://github.com/apache/spark/pull/9264).

      So when SparkContext.runJob is suspended, ForkJoinPool will run another task in the same thread, however, the local properties has been polluted.

      Attachments

        Issue Links

          Activity

            People

              zsxwing Shixiong Zhu
              zsxwing Shixiong Zhu
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: