[CALCITE-5767] JDBC adapter for MSSQL adds GROUPING to ORDER BY clause twice when emulating NULLS LAST - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.35.0
Component/s: core
Labels:
- pull-request-available

Description

JDBC adapter for MSSQL adds a call to the GROUPING function to the ORDER BY clause twice when emulating NULLS LAST. This is a problem because MSSQL disallows duplicate sort keys; it is caused because the MSSQL dialect wrongly gives the GROUPING function special treatment when emulating NULL direction.

MssqlSqlDialect.emulateNullDirection has special logic for GROUPING calls. This seems to be an optimization attempt since GROUPING is known to never return NULL, and therefore needs no null direction emulation. However, this causes problems because:

1. MSSQL does not have the same default null direction as Calcite's RelBuilder (and most other dialects).
2. MSSQL does not support the common NULLS FIRST or NULLS LAST syntax.
3. MSSQL does not allow sorting on the same field twice, even though there is no theoretical issue with this.

Each of these properties must be present for the problem to occur, so it's a bit niche and specific to MSSQL. Seems like the best solution is to simply eliminate the special-case treatment for GROUPING in emulateNullDirection, which is currently creating problems due to property #3.

More in-depth explanation:

In RelBuilder.collation, we use the "default null direction" to insert rex nodes as sorting expressions, but this is only the default null direction for NULLS-high dialects, i.e. not MSSQL. This is a problem because MSSQL has special-case logic for emulating null direction of GROUPING calls, whereby it effectively duplicates the expression. Really, MssqlSqlDialect.emulateNullDirection probably should've been returning null instead, signalling to callers that no null-direction emulation is necessary because GROUPING never returns NULL, but this causes another problem due to property #2 above when the null direction is non-default as is caused simply by using RelBuilder.collation as described above (it should be noted that this method takes rex nodes instead of RelFieldCollation object, so there is no way to specify null direction) because the non-default null direction is not expanded into a CASE expression (MSSQL does not support NULLS FIRST or LAST syntax).

Here's a test illustrating the problem:

Input SQL (default dialect)

select "product_class_id", "brand_name", GROUPING("brand_name")
from "product"
group by GROUPING SETS (("product_class_id", "brand_name"), ("product_class_id"))
order by 3, 2, 1

Current behavior for unparsing as MSSQL (incorrect because it orders by the same column twice; GROUPING([brand_name]) and 3, which will fail if you try to actually run this against a real MSSQL database, even though it seems like it shouldn't):

SELECT [product_class_id], [brand_name], GROUPING([brand_name])
FROM [foodmart].[product]
GROUP BY GROUPING SETS(([product_class_id], [brand_name]), [product_class_id])
ORDER BY
         GROUPING([brand_name]),
         3,
         CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
         [brand_name],
         CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
         [product_class_id]

Behavior where MssqlSqlDialect.emulateNullDirection simply returns null for GROUPING expressions (incorrect because it uses NULLS LAST syntax):

...
ORDER BY
         3 NULLS LAST,
         CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
         [brand_name],
         CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
         [product_class_id]

Acceptable behavior (although the first ORDER BY-clause is effectively ordering by a constant, this will at least run and produce the correct results):

...
ORDER BY
         CASE WHEN GROUPING([brand_name]) IS NULL THEN 0 ELSE 1 END,
         3,
         CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
         [brand_name],
         CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
         [product_class_id]

Attachments

Issue Links

links to

GitHub Pull Request #3251

Activity

People

Assignee:: Will Noble

Reporter:: Will Noble

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 07/Jun/23 21:51

Updated:: 27/Jul/23 10:22

Resolved:: 10/Jun/23 19:54