Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8628

Race condition in AbstractSparkSQLParser.parse

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.3.0, 1.3.1, 1.4.0
    • Fix Version/s: 1.4.1, 1.5.0
    • Component/s: SQL
    • Labels:

      Description

      SPARK-5009 introduced the following code in AbstractSparkSQLParser:

      def parse(input: String): LogicalPlan = {
          // Initialize the Keywords.
          lexical.initialize(reservedWords)
          phrase(start)(new lexical.Scanner(input)) match {
            case Success(plan, _) => plan
            case failureOrError => sys.error(failureOrError.toString)
          }
        }
      

      The corresponding initialize method in SqlLexical is not thread-safe:

        /* This is a work around to support the lazy setting */
        def initialize(keywords: Seq[String]): Unit = {
          reserved.clear()
          reserved ++= keywords
        }
      

      I'm hitting this when parsing multiple SQL queries concurrently. When one query parsing starts, it empties the reserved keyword list, then a race-condition occurs and other queries fail to parse because they recognize keywords as identifiers.

        Attachments

          Activity

            People

            • Assignee:
              vinodkc Vinod KC
              Reporter:
              smolav Santiago M. Mola
              Shepherd:
              Michael Armbrust
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: