Details
Description
SPARK-5009 introduced the following code in AbstractSparkSQLParser:
def parse(input: String): LogicalPlan = { // Initialize the Keywords. lexical.initialize(reservedWords) phrase(start)(new lexical.Scanner(input)) match { case Success(plan, _) => plan case failureOrError => sys.error(failureOrError.toString) } }
The corresponding initialize method in SqlLexical is not thread-safe:
/* This is a work around to support the lazy setting */ def initialize(keywords: Seq[String]): Unit = { reserved.clear() reserved ++= keywords }
I'm hitting this when parsing multiple SQL queries concurrently. When one query parsing starts, it empties the reserved keyword list, then a race-condition occurs and other queries fail to parse because they recognize keywords as identifiers.