Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-5775

Null direction emulation broken for complex expressions on some dialects

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • core
    • None

    Description

      This is a problem in BigQuery, and may be a problem in other dialects as well. Consider the following piece of valid BQ SQL:

      SELECT REPEAT(first_name, 2),
             COUNT(id)
      FROM looker_test.users
      GROUP BY REPEAT(first_name, 2)
      ORDER BY 1
      

      Now consider a version where the ORDER BY clause is changed to this:

      ORDER BY REPEAT(first_name, 2)
      

      This is logically the same query, because the expression in the ORDER clause is the same as the one in the SELECT / GROUP clauses. BigQuery is sophisticated enough to match the select to the group expression in both queries, but cannot match either with the order expression. It gives this error: ORDER BY clause expression references column first_name which is neither grouped nor aggregated.

      So, when sorting by complex expressions in BQ, Calcite relies on either:

      • No null direction emulation required.
      • sorting by alias or ordinal, which is a problem with current null direction emulation because it adds an extra complex sort expression, or
      • having a query that just happens to also have the underlying field in the GROUP BY clause by itself, which seems to actually happen pretty often in my testing, but obviously shouldn't be a constraint.

      As I wrote that, I realized this may be easily fixable for BQ since it added support for NULLS FIRST / LAST in 2020 and it seems Calcite has not caught up yet. Consider this rel node:

      LogicalSort(sort0=[$0], dir0=[ASC])
        LogicalAggregate(group=[{0}], cent=[COUNT($1)])
          LogicalProject($f0=[CASE(IS NULL($4), 0, 1)], MGR=[$3])
            JdbcTableScan(table=[[JDBC_SCOTT, EMP]])
      

      Calcite would convert it to this in BigQuery due to null direction emulation:

      SELECT CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END AS `$f0`, COUNT(MGR) AS cent
      FROM SCOTT.EMP
      GROUP BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END
      ORDER BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END IS NULL, 1
      

      Which of course triggers the problem described above. This may be a problem for MSSQL as well since it doesn't support NULLS LAST. The fix for BQ, at least, may be to just support NULLS LAST and sort by ordinal.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              wnoble Will Noble
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: