Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8628

Race condition in AbstractSparkSQLParser.parse

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.3.0, 1.3.1, 1.4.0
    • 1.4.1, 1.5.0
    • SQL

    Description

      SPARK-5009 introduced the following code in AbstractSparkSQLParser:

      def parse(input: String): LogicalPlan = {
          // Initialize the Keywords.
          lexical.initialize(reservedWords)
          phrase(start)(new lexical.Scanner(input)) match {
            case Success(plan, _) => plan
            case failureOrError => sys.error(failureOrError.toString)
          }
        }
      

      The corresponding initialize method in SqlLexical is not thread-safe:

        /* This is a work around to support the lazy setting */
        def initialize(keywords: Seq[String]): Unit = {
          reserved.clear()
          reserved ++= keywords
        }
      

      I'm hitting this when parsing multiple SQL queries concurrently. When one query parsing starts, it empties the reserved keyword list, then a race-condition occurs and other queries fail to parse because they recognize keywords as identifiers.

      Attachments

        Activity

          People

            vinodkc Vinod KC
            smolav Santiago M. Mola
            Michael Armbrust Michael Armbrust
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: