Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-3077

Rewrite CUBE&ROLLUP queries in SparkSqlDialect

    XMLWordPrintableJSON

    Details

      Description

      Background: we are building a platform that adopts Calcite to process (i.e., parse&validate&convert&optimize) SQL queries and then regenerate the final SQL. For the purpose of handling large volume data, we use the popular SparkSQL engine to execute the generated SQL query.

      However, we found a great part of real-world test cases failed, due to syntax differences of
      CUBE/ROLLUP/GROUPING SETS clauses. Spark SQL dialect supports only "WITH ROLLUP&CUBE" in the "GROUP BY" clause. The corresponding grammer [1] is defined as below.

      aggregation
          : GROUP BY groupingExpressions+=expression (',' groupingExpressions+=expression)* (
            WITH kind=ROLLUP
          | WITH kind=CUBE
          | kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')')?
          | GROUP BY kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')'
      ;
      

      To fill this gap, I think we need to rewrite CUBE/ROLLUP/GROUPING SETS clauses in SparkSqlDialect, especially for some complex cases.

      group by cube ((a, b), (c, d))
      group by cube(a,b), cube(c,d)
      

      [1]https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

        Attachments

          Activity

            People

            • Assignee:
              donnyzone Feng Zhu
              Reporter:
              donnyzone Feng Zhu
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1.5h
                1.5h