Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12362

Create a full-fledged built-in SQL parser

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.0.0
    • SQL
    • None

    Description

      Spark currently has two SQL parsers it is using: a simple one based on Scala parser combinator, and another one based on Hive.

      Neither is a good long term solution. The parser combinator one has bad error messages for users and does not warn when there are conflicts in the defined grammar. The Hive one depends directly on Hive itself, and as a result, it is very difficult to introduce new grammar or fix bugs.

      The goal of the ticket is to create a single SQL query parser that is powerful enough to replace the existing ones. The requirements for the new parser are:

      1. Can support almost all of HiveQL
      2. Can support all existing SQL parser built using Scala parser combinators
      3. Can be used for expression parsing in addition to SQL query parsing
      4. Can provide good error messages for incorrect syntax

      Rather than building one from scratch, we should investigate whether we can leverage existing open source projects such as Hive (by inlining the parser part) or Calcite.

      Attachments

        Issue Links

          1.
          Initial import of the Hive parser Sub-task Resolved Nong Li
          2.
          Add acknowledge that the parser was initially from Hive Sub-task Resolved Herman van Hövell
          3.
          Move parser from hive module to catalyst (or sql-core) module Sub-task Resolved Herman van Hövell
          4.
          Enable expression parsing (used in DataFrames) Sub-task Resolved Herman van Hövell
          5.
          Grammar parity with existing SQL parser Sub-task Resolved Herman van Hövell
          6.
          Migrate DDL parsing to the newly absorbed parser Sub-task Resolved L. C. Hsieh
          7.
          Parser should not silently ignore the distinct keyword used in an aggregate function when OVER clause is used Sub-task Resolved L. C. Hsieh
          8.
          Split IdentifiersParser.g to avoid single huge java source Sub-task Resolved Davies Liu
          9.
          better support of parentheses in partition by and order by clause of window function's over clause Sub-task Resolved L. C. Hsieh
          10.
          Support from clause surrounded by `()` Sub-task Resolved L. C. Hsieh
          11.
          Improve test coverage Sub-task Closed Unassigned
          12.
          Limit is not supported inside Set Operation Sub-task Closed Unassigned
          13.
          Parse numbers as decimals rather than doubles Sub-task Resolved Herman van Hövell
          14.
          Remove parser pluggability Sub-task Resolved Reynold Xin
          15.
          Migrate the SparkSQLParser to the new parser Sub-task Resolved Herman van Hövell
          16.
          Migrate the ExtendedHiveQlParser to the new parser Sub-task Resolved Herman van Hövell
          17.
          Rename ParserDialect -> ParserInterface Sub-task Resolved Reynold Xin
          18.
          Make Token pattern matching in the parser case insensitive Sub-task Closed Reynold Xin
          19.
          Implement command to set current database Sub-task Resolved L. C. Hsieh
          20.
          Subquery Alias in Hive Parser Sub-task Closed Unassigned
          21.
          Replace ANTLR3 SQL parser by a ANTLR4 SQL parser Sub-task Resolved Herman van Hövell
          22.
          Add DDL commands to ANTLR4 Parser Sub-task Resolved Herman van Hövell
          23.
          Remove ANTLR3 based parser Sub-task Resolved Herman van Hövell
          24.
          Migrate HiveQl parsing to ANTLR4 parser Sub-task Resolved Herman van Hövell
          25.
          Better error message for syntax error in the SQL parser Sub-task Resolved Herman van Hövell
          26.
          Audit non-reserved keyword list in ANTLR4 parser. Sub-task Resolved Bo Meng

          Activity

            People

              hvanhovell Herman van Hövell
              rxin Reynold Xin
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: