Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-3593

RelToSqlConverter changes target of ambiguous HAVING clause with a Project on Filter on Aggregate

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments



      Best shown with the shell of a test in RelToSqlConverter test.

       The following SQL on BigQuery 

      select product_id - 1000 as product_id
      from (
       select product_id, avg(gross_weight) as agw
       from (SELECT 1 as product_id, 70 as net_weight, 170 as gross_weight) as product
       where net_weight < 100
       group by product_id having product_id > 0)

      produces one result, because the having filter applies to the product id before subtraction, of course.

      Running it through the machinery in that test (`sql(query).withBigQuery().ok(expected)`) translates it to:

      SELECT product_id - 1000 AS product_id
      from (SELECT 1 as product_id, 70 as net_weight, 170 as gross_weight) as product
      WHERE net_weight < 100
      GROUP BY product_id
      HAVING product_id > 0

      This changes the meaning of the query: now the HAVING is on the after-subtraction product_id and you get no results, rather than the one result.

      Note that this is not true in HyperSQL, as it has different semantics around the HAVING namespace.

      select "product_id" - 1000 as "product_id"
      from (
       select "product_id", avg("gross_weight") as agw
       from (SELECT 1 as "product_id", 70 as "net_weight", 170 as "gross_weight" FROM (VALUES(0))) as product
       where "net_weight" < 100
       group by "product_id" having "product_id" > 0)


      SELECT "product_id" - 1000 AS "product_id"
      from (SELECT 1 as "product_id", 70 as "net_weight", 170 as "gross_weight" FROM (VALUES(0))) as product
      WHERE "net_weight" < 100
      GROUP BY "product_id"
      HAVING "product_id" > 0

      But the meaning is preserved: both return a row. 

      I'm not enough of a SQL standards expert to know which one is being more compliant, but presumably both would have to be supported via some sort of flag?

      I think the fix would be to force the subselect on dialects such as BigQuery that have this behavior. Probably something that looks a lot like https://github.com/apache/calcite/blob/3530daaa8cad43aad6845b6c79e4bc1ca0e72f5f/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L1043-L1047

      The test, of course, looks like pretty silly SQL no one would ever write, but the point is this is what's generated when you have 

      Project f(x) as x
        Filter g(x)
          Aggregate {<x>}, ...


        Issue Links



            • Assignee:
              jinxing6042@126.com Jin Xing
              swtalbot Steven Talbot


              • Created:

                Time Tracking

                Original Estimate - Not Specified
                Not Specified
                Remaining Estimate - 0h
                Time Spent - 0.5h

                  Issue deployment