Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-5767

JDBC adapter for MSSQL adds GROUPING to ORDER BY clause twice when emulating NULLS LAST

    XMLWordPrintableJSON

Details

    Description

      JDBC adapter for MSSQL adds a call to the GROUPING function to the ORDER BY clause twice when emulating NULLS LAST. This is a problem because MSSQL disallows duplicate sort keys; it is caused because the MSSQL dialect wrongly gives the GROUPING function special treatment when emulating NULL direction.

      MssqlSqlDialect.emulateNullDirection has special logic for GROUPING calls. This seems to be an optimization attempt since GROUPING is known to never return NULL, and therefore needs no null direction emulation. However, this causes problems because:

      1. MSSQL does not have the same default null direction as Calcite's RelBuilder (and most other dialects).
      2. MSSQL does not support the common NULLS FIRST or NULLS LAST syntax.
      3. MSSQL does not allow sorting on the same field twice, even though there is no theoretical issue with this.

      Each of these properties must be present for the problem to occur, so it's a bit niche and specific to MSSQL. Seems like the best solution is to simply eliminate the special-case treatment for GROUPING in emulateNullDirection, which is currently creating problems due to property #3.

      More in-depth explanation:

      In RelBuilder.collation, we use the "default null direction" to insert rex nodes as sorting expressions, but this is only the default null direction for NULLS-high dialects, i.e. not MSSQL. This is a problem because MSSQL has special-case logic for emulating null direction of GROUPING calls, whereby it effectively duplicates the expression. Really, MssqlSqlDialect.emulateNullDirection probably should've been returning null instead, signalling to callers that no null-direction emulation is necessary because GROUPING never returns NULL, but this causes another problem due to property #2 above when the null direction is non-default as is caused simply by using RelBuilder.collation as described above (it should be noted that this method takes rex nodes instead of RelFieldCollation object, so there is no way to specify null direction) because the non-default null direction is not expanded into a CASE expression (MSSQL does not support NULLS FIRST or LAST syntax).

      Here's a test illustrating the problem:

      Input SQL (default dialect)

      select "product_class_id", "brand_name", GROUPING("brand_name")
      from "product"
      group by GROUPING SETS (("product_class_id", "brand_name"), ("product_class_id"))
      order by 3, 2, 1
      

      Current behavior for unparsing as MSSQL (incorrect because it orders by the same column twice; GROUPING([brand_name]) and 3, which will fail if you try to actually run this against a real MSSQL database, even though it seems like it shouldn't):

      SELECT [product_class_id], [brand_name], GROUPING([brand_name])
      FROM [foodmart].[product]
      GROUP BY GROUPING SETS(([product_class_id], [brand_name]), [product_class_id])
      ORDER BY
               GROUPING([brand_name]),
               3,
               CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
               [brand_name],
               CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
               [product_class_id]
      

      Behavior where MssqlSqlDialect.emulateNullDirection simply returns null for GROUPING expressions (incorrect because it uses NULLS LAST syntax):

      ...
      ORDER BY
               3 NULLS LAST,
               CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
               [brand_name],
               CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
               [product_class_id]
      

      Acceptable behavior (although the first ORDER BY-clause is effectively ordering by a constant, this will at least run and produce the correct results):

      ...
      ORDER BY
               CASE WHEN GROUPING([brand_name]) IS NULL THEN 0 ELSE 1 END,
               3,
               CASE WHEN [brand_name] IS NULL THEN 1 ELSE 0 END,
               [brand_name],
               CASE WHEN [product_class_id] IS NULL THEN 1 ELSE 0 END,
               [product_class_id]
      

      Attachments

        Issue Links

          Activity

            People

              wnoble Will Noble
              wnoble Will Noble
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: