Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45616

Usages of ParVector are unsafe because it does not propagate ThreadLocals or SparkSession

    XMLWordPrintableJSON

Details

    Description

      CastSuiteBase and ExpressionInfoSuite use ParVector.foreach() to run Spark SQL queries in parallel. They incorrectly assume that each parallel operation will inherit the main thread’s active SparkSession. This is only true when these parallel operations run in freshly-created threads. However, when other code has already run some parallel operations before Spark was started, then there may be existing threads that do not have an active SparkSession. In that case, these tests fail with NullPointerExceptions when creating SparkPlans or running SQL queries.

      The fix is to use the existing method ThreadUtils.parmap(). This method creates fresh threads that inherit the current active SparkSession, and it propagates the Spark ThreadLocals.

      We should also add a scalastyle warning against use of ParVector.

      Attachments

        Issue Links

          Activity

            People

              ankurd Ankur Dave
              ankurd Ankur Dave
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: