Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24260

Support for multi-statement SQL in SparkSession.sql API

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.3.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      sparkSession.sql API only supports a single SQL statement to be executed for a call. A multi-statement SQL cannot be executed in a single call. For example,

      SparkSession sparkSession = SparkSession.builder().appName("MultiStatementSQL")                                          .master("local").config("", "").getOrCreate()
      sparkSession.sql("DROP TABLE IF EXISTS count_employees; CACHE TABLE employees; CREATE TEMPORARY VIEW count_employees AS SELECT count(*) as cnt FROM employees; SELECT * FROM count_employees"

      Above code fails with the error, 

      org.apache.spark.sql.catalyst.parser.ParseException: mismatched input ';' expecting <EOF>

      Solution to this problem is to use the .sql API multiple times in a specific order.

      sparkSession.sql("DROP TABLE IF EXISTS count_employees")
      sparkSession.sql("CACHE TABLE employees")
      sparkSession.sql("CREATE TEMPORARY VIEW count_employees AS SELECT count(*) as cnt FROM employees;")
      sparkSession.sql("SELECT * FROM count_employees")
      
      

      If these SQL statements come from a string / file, users have to implement their own parsers to execute this. Like,

      val sqlFromFile = """DROP TABLE IF EXISTS count_employees;
       |CACHE TABLE employees;
       |CREATE TEMPORARY VIEW count_employees AS SELECT count(*) as cnt FROM employees; SELECT * FROM count_employees""".stripMargin
      sqlFromFile.split(";")
      .forEach(line => sparkSession.sql(line))
      
      

      This naive parser can fail for many edge cases (like ";" inside a string). Even if users use the same grammar used by Spark and implement their own parsing, it can go out of sync with the way Spark parses the statements.

      Can support for multiple SQL statements be built into SparkSession.sql API itself?

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ravindhranath Ravindra Nath Kakarla
            • Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: