Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
This is a problem in BigQuery, and may be a problem in other dialects as well. Consider the following piece of valid BQ SQL:
SELECT REPEAT(first_name, 2), COUNT(id) FROM looker_test.users GROUP BY REPEAT(first_name, 2) ORDER BY 1
Now consider a version where the ORDER BY clause is changed to this:
ORDER BY REPEAT(first_name, 2)
This is logically the same query, because the expression in the ORDER clause is the same as the one in the SELECT / GROUP clauses. BigQuery is sophisticated enough to match the select to the group expression in both queries, but cannot match either with the order expression. It gives this error: ORDER BY clause expression references column first_name which is neither grouped nor aggregated.
So, when sorting by complex expressions in BQ, Calcite relies on either:
- No null direction emulation required.
- sorting by alias or ordinal, which is a problem with current null direction emulation because it adds an extra complex sort expression, or
- having a query that just happens to also have the underlying field in the GROUP BY clause by itself, which seems to actually happen pretty often in my testing, but obviously shouldn't be a constraint.
As I wrote that, I realized this may be easily fixable for BQ since it added support for NULLS FIRST / LAST in 2020 and it seems Calcite has not caught up yet. Consider this rel node:
LogicalSort(sort0=[$0], dir0=[ASC]) LogicalAggregate(group=[{0}], cent=[COUNT($1)]) LogicalProject($f0=[CASE(IS NULL($4), 0, 1)], MGR=[$3]) JdbcTableScan(table=[[JDBC_SCOTT, EMP]])
Calcite would convert it to this in BigQuery due to null direction emulation:
SELECT CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END AS `$f0`, COUNT(MGR) AS cent FROM SCOTT.EMP GROUP BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END ORDER BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END IS NULL, 1
Which of course triggers the problem described above. This may be a problem for MSSQL as well since it doesn't support NULLS LAST. The fix for BQ, at least, may be to just support NULLS LAST and sort by ordinal.
Attachments
Issue Links
- is related to
-
CALCITE-5793 Use NULLS FIRST / LAST when unparsing in BigQuery
- Closed
- relates to
-
CALCITE-5724 Generated SQL uses literal values in ORDER BY clauses
- Open
-
CALCITE-4987 JDBC adapter generates incorrect query when ORDER BY alias collides with field in SELECT ITEM
- In Progress