Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
We implemented the GROUP_ID() function in CALCITE-512 but we got the specification wrong, and it returns the wrong result.
GROUP_ID is not in the SQL standard. It is implemented only by Oracle.
I mistakenly believed that GROUP_ID() is equivalent to GROUPING_ID(g1, ..., gn) (in a query with GROUP BY g1, ..., gn). In fact, GROUP_ID is useful only if you have duplicate grouping sets. If grouping sets are distinct, GROUP_ID() will always return zero.
Example 1
SELECT deptno, job, GROUP_ID() AS g FROM Emp GROUP BY ROLLUP(deptno, job) DEPTNO JOB G ---------- --------- ---------- 10 CLERK 0 10 MANAGER 0 10 PRESIDENT 0 10 0 20 CLERK 0 20 ANALYST 0 20 MANAGER 0 20 0 30 CLERK 0 30 MANAGER 0 30 SALESMAN 0 30 0 0
produces grouping sets (deptno, job), (deptno), (). These are distinct, so GROUP_ID() is 0 for all rows.
Example 2
SELECT deptno, GROUP_ID() AS g FROM Emp GROUP BY GROUPING SETS (deptno, (), ()); DEPTNO G ---------- ---------- 10 0 20 0 30 0 0 1
As you can see, the grouping set () occurs twice. So there is one row in the result for each occurrence: the first occurrence has g = 0; the second has g = 1.
In my fix for CALCITE-1069, I will change GROUP_ID() to always return 0. This is wrong, but nevertheless closer to the required behavior.
Attachments
Issue Links
- relates to
-
CALCITE-4748 If there are duplicate GROUPING SETS, Calcite should return duplicate rows
- Closed
-
CALCITE-4220 In SqlToRelConverter, use RelBuilder for creating Aggregate
- Closed
- links to