[CALCITE-3077] Rewrite CUBE&ROLLUP queries in SparkSqlDialect - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.20.0
Fix Version/s: 1.20.0
Component/s: core
Labels:
- pull-request-available

Description

Background: we are building a platform that adopts Calcite to process (i.e., parse&validate&convert&optimize) SQL queries and then regenerate the final SQL. For the purpose of handling large volume data, we use the popular SparkSQL engine to execute the generated SQL query.

However, we found a great part of real-world test cases failed, due to syntax differences of
CUBE/ROLLUP/GROUPING SETS clauses. Spark SQL dialect supports only "WITH ROLLUP&CUBE" in the "GROUP BY" clause. The corresponding grammer [1] is defined as below.

aggregation
    : GROUP BY groupingExpressions+=expression (',' groupingExpressions+=expression)* (
      WITH kind=ROLLUP
    | WITH kind=CUBE
    | kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')')?
    | GROUP BY kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')'
;

To fill this gap, I think we need to rewrite CUBE/ROLLUP/GROUPING SETS clauses in SparkSqlDialect, especially for some complex cases.

group by cube ((a, b), (c, d))
group by cube(a,b), cube(c,d)

[1]https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

Attachments

Issue Links

is related to

CALCITE-5411 Update Spark Dialect to support ROLLUP & CUBE aggregate functions.

Closed

links to

GitHub Pull Request #1231

GitHub Pull Request #1232

Activity

People

Assignee:: Feng Zhu

Reporter:: Feng Zhu

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 18/May/19 03:15

Updated:: 01/Dec/22 15:17

Resolved:: 27/May/19 03:03

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1.5h