Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1306

Allow GROUP BY and HAVING to reference SELECT expressions by ordinal and alias

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.13.0
    • Component/s: None
    • Labels:

      Description

      Allow GROUP BY and HAVING to reference SELECT expressions by ordinal and alias. It is not standard SQL, but MySQL and PostgreSQL allow it.

      See Stack Overflow: SQL - using alias in Group By.

      It would be enabled only by new methods isGroupByOrdinal and isGroupByAlias in SqlConformance.

      We might allow alias in the HAVING clause (as described in HIVE-10557) but ordinal does not make sense.

      Expressions that are not available before grouping would be illegal; for instance:

      select count(*) as c
      from t
      group by c
      

      We'd also need rules to resolve ambiguous expressions. For instance, in

      select e.empno as deptno
      from emp as e join dept as d
      where e.deptno = d.deptno
      group by deptno
      

      does deptno refer to e.deptno, d.deptno, or e.empno?

        Issue Links

          Activity

          Hide
          xiadao Dragobin added a comment -

          Is there a development plan for this issue?

          Show
          xiadao Dragobin added a comment - Is there a development plan for this issue?
          Hide
          julianhyde Julian Hyde added a comment -

          No.

          Show
          julianhyde Julian Hyde added a comment - No.
          Hide
          rajeshbabu Rajeshbabu Chintaguntla added a comment -

          Julian Hyde FYI I am working on this. Can this issue can be assigned to me?

          Show
          rajeshbabu Rajeshbabu Chintaguntla added a comment - Julian Hyde FYI I am working on this. Can this issue can be assigned to me?
          Hide
          julianhyde Julian Hyde added a comment -

          Rajeshbabu Chintaguntla, I have assigned to you. I have also made you a contributor, so you can assign to yourself in future.

          Show
          julianhyde Julian Hyde added a comment - Rajeshbabu Chintaguntla , I have assigned to you. I have also made you a contributor, so you can assign to yourself in future.
          Hide
          rajeshbabu Rajeshbabu Chintaguntla added a comment -

          Here is the PR created for this https://github.com/apache/calcite/pull/413
          Julian Hyde Please review.

          Show
          rajeshbabu Rajeshbabu Chintaguntla added a comment - Here is the PR created for this https://github.com/apache/calcite/pull/413 Julian Hyde Please review.
          Hide
          julianhyde Julian Hyde added a comment - - edited

          Here are a few tricky cases that I didn't see tested:

          • The query SELECT a, COUNT(*) AS c FROM t GROUP BY a, c is cyclic and must fail in validation.
          • Are large literals interpreted as column references (and therefore fail because out of range) or are they treated as integers? E.g. SELECT deptno FROM emp GROUP BY deptno, 100. See what Postgres does and make sure we do the same.
          • Are literals inside expressions interpreted as column references? E.g. SELECT COUNT(*) FROM (SELECT 1 FROM emp GROUP BY substring(name FROM 2 FOR 3)). See what Postgres does and make sure we do the same.
          • Check that if the conformance disables them, we get the old behavior for aliases and ordinals.
          • Should not be able to use alias in an aggregate function in the HAVING clause. E.g. SELECT deptno AS x FROM emp HAVING min(x) < 20 is an error.
          • If an alias matches a column, the alias wins. E.g. SELECT COUNT(*) FROM (SELECT gender AS deptno FROM emp GROUP BY deptno) should return 2. Please see what Postgres does. Also test a HAVING query.
          • If an alias matches two columns, the alias wins, and the query is not ambiguous. E.g. SELECT COUNT(*) FROM (SELECT gender AS deptno FROM emp, dept GROUP BY deptno). Also test a HAVING query.
          • Matching is according to the case of the session. Thus SELECT x + y AS "z" FROM t GROUP BY "Z" is valid if case sensitivity is off.
          • Expressions involving aliases, e.g. SELECT a + b AS c, a + b + d, COUNT(*) FROM t GROUP BY c, d should be valid, because even though a + b + d is not grouped, it matches (a + b) + d, which combines two grouped expressions. Check what Postgres does.
          • Referencing aliases in the SELECT clause is not valid, SELECT a + b AS c, c + d FROM t GROUP BY c, d. Again, see what Postgres does.

          We need to test sql-to-rel conversion for some of the cases that do not fail but have different behavior, e.g. large literals, alias same name as columns.

          Matching on error message is dodgy:

          e.getCause().getMessage().equals(RESOURCE.columnNotFound(id.names.get(0)).str())

          When comparing enum values use == not equals, or better, switch. See

          if (root.getKind().equals(SqlKind.CUBE)

          .

          Show
          julianhyde Julian Hyde added a comment - - edited Here are a few tricky cases that I didn't see tested: The query SELECT a, COUNT(*) AS c FROM t GROUP BY a, c is cyclic and must fail in validation. Are large literals interpreted as column references (and therefore fail because out of range) or are they treated as integers? E.g. SELECT deptno FROM emp GROUP BY deptno, 100 . See what Postgres does and make sure we do the same. Are literals inside expressions interpreted as column references? E.g. SELECT COUNT(*) FROM (SELECT 1 FROM emp GROUP BY substring(name FROM 2 FOR 3)) . See what Postgres does and make sure we do the same. Check that if the conformance disables them, we get the old behavior for aliases and ordinals. Should not be able to use alias in an aggregate function in the HAVING clause. E.g. SELECT deptno AS x FROM emp HAVING min(x) < 20 is an error. If an alias matches a column, the alias wins. E.g. SELECT COUNT(*) FROM (SELECT gender AS deptno FROM emp GROUP BY deptno) should return 2. Please see what Postgres does. Also test a HAVING query. If an alias matches two columns, the alias wins, and the query is not ambiguous. E.g. SELECT COUNT(*) FROM (SELECT gender AS deptno FROM emp, dept GROUP BY deptno) . Also test a HAVING query. Matching is according to the case of the session. Thus SELECT x + y AS "z" FROM t GROUP BY "Z" is valid if case sensitivity is off. Expressions involving aliases, e.g. SELECT a + b AS c, a + b + d, COUNT(*) FROM t GROUP BY c, d should be valid, because even though a + b + d is not grouped, it matches (a + b) + d , which combines two grouped expressions. Check what Postgres does. Referencing aliases in the SELECT clause is not valid, SELECT a + b AS c, c + d FROM t GROUP BY c, d . Again, see what Postgres does. We need to test sql-to-rel conversion for some of the cases that do not fail but have different behavior, e.g. large literals, alias same name as columns. Matching on error message is dodgy: e.getCause().getMessage().equals(RESOURCE.columnNotFound(id.names.get(0)).str()) When comparing enum values use == not equals, or better, switch. See if (root.getKind().equals(SqlKind.CUBE) .
          Hide
          rajeshbabu Rajeshbabu Chintaguntla added a comment -

          Julian Hyde Thanks for review. Committed the changes handling review comments to pull request at https://github.com/apache/calcite/pull/413.

          The query SELECT a, COUNT AS c FROM t GROUP BY a, c is cyclic and must fail in validation.

          It fails with aggregate functions not in group by in both calcite and Postgres. Added test as well.

          Are large literals interpreted as column references (and therefore fail because out of range) or are they treated as integers? E.g. SELECT deptno FROM emp GROUP BY deptno, 100. See what Postgres does and make sure we do the same.

          Postgres fails with ordinal position not in range same in calcite after my changes. Added test as well.

          Are literals inside expressions interpreted as column references? E.g. SELECT COUNT FROM (SELECT 1 FROM emp GROUP BY substring(name FROM 2 FOR 3)). See what Postgres does and make sure we do the same.

          This is valid in Postgres and same in calcite now. 2 and 3 are considered as liternals not an ordinal. Added test as well.

          Check that if the conformance disables them, we get the old behavior for aliases and ordinals.

          Added the tests which fail without conformance. Added test as well.

          Should not be able to use alias in an aggregate function in the HAVING clause. E.g. SELECT deptno AS x FROM emp HAVING min < 20 is an error.

          Not allowing alias in having aggregate expression now with my changes. Added test as well.

          If an alias matches a column, the alias wins. E.g. SELECT COUNT FROM (SELECT gender AS deptno FROM emp GROUP BY deptno) should return 2. Please see what Postgres does. Also test a HAVING query.

          Giving priority to alias than column when there is conflict. But Postgres gives priority to column name and fail with ambiguous column. Added test as well.

          If an alias matches two columns, the alias wins, and the query is not ambiguous. E.g. SELECT COUNT FROM (SELECT gender AS deptno FROM emp, dept GROUP BY deptno). Also test a HAVING query.

          Giving priority to alias than column name. Added test as well.

          Matching is according to the case of the session. Thus SELECT x + y AS "z" FROM t GROUP BY "Z" is valid if case sensitivity is off.

          It works if sensitivity is off added test as well.

          Expressions involving aliases, e.g. SELECT a + b AS c, a + b + d, COUNT FROM t GROUP BY c, d should be valid, because even though a + b + d is not grouped, it matches (a + b) + d, which combines two grouped expressions. Check what Postgres does.

          This works as well and added test cases for the same. Postgres allows the same.

          Referencing aliases in the SELECT clause is not valid, SELECT a + b AS c, c + d FROM t GROUP BY c, d. Again, see what Postgres does.

          This is not valid in calcite and Postgres.

          We need to test sql-to-rel conversion for some of the cases that do not fail but have different behavior, e.g. large literals, alias same name as columns.

          Added some sql-to-rel tests for corner cases.

          e.getCause().getMessage().equals(RESOURCE.columnNotFound(id.names.get(0)).str())

          Removed this.

          if (root.getKind().equals(SqlKind.CUBE)

          Changed to switch case.

          Please review the changes.

          Show
          rajeshbabu Rajeshbabu Chintaguntla added a comment - Julian Hyde Thanks for review. Committed the changes handling review comments to pull request at https://github.com/apache/calcite/pull/413 . The query SELECT a, COUNT AS c FROM t GROUP BY a, c is cyclic and must fail in validation. It fails with aggregate functions not in group by in both calcite and Postgres. Added test as well. Are large literals interpreted as column references (and therefore fail because out of range) or are they treated as integers? E.g. SELECT deptno FROM emp GROUP BY deptno, 100. See what Postgres does and make sure we do the same. Postgres fails with ordinal position not in range same in calcite after my changes. Added test as well. Are literals inside expressions interpreted as column references? E.g. SELECT COUNT FROM (SELECT 1 FROM emp GROUP BY substring(name FROM 2 FOR 3)). See what Postgres does and make sure we do the same. This is valid in Postgres and same in calcite now. 2 and 3 are considered as liternals not an ordinal. Added test as well. Check that if the conformance disables them, we get the old behavior for aliases and ordinals. Added the tests which fail without conformance. Added test as well. Should not be able to use alias in an aggregate function in the HAVING clause. E.g. SELECT deptno AS x FROM emp HAVING min < 20 is an error. Not allowing alias in having aggregate expression now with my changes. Added test as well. If an alias matches a column, the alias wins. E.g. SELECT COUNT FROM (SELECT gender AS deptno FROM emp GROUP BY deptno) should return 2. Please see what Postgres does. Also test a HAVING query. Giving priority to alias than column when there is conflict. But Postgres gives priority to column name and fail with ambiguous column. Added test as well. If an alias matches two columns, the alias wins, and the query is not ambiguous. E.g. SELECT COUNT FROM (SELECT gender AS deptno FROM emp, dept GROUP BY deptno). Also test a HAVING query. Giving priority to alias than column name. Added test as well. Matching is according to the case of the session. Thus SELECT x + y AS "z" FROM t GROUP BY "Z" is valid if case sensitivity is off. It works if sensitivity is off added test as well. Expressions involving aliases, e.g. SELECT a + b AS c, a + b + d, COUNT FROM t GROUP BY c, d should be valid, because even though a + b + d is not grouped, it matches (a + b) + d, which combines two grouped expressions. Check what Postgres does. This works as well and added test cases for the same. Postgres allows the same. Referencing aliases in the SELECT clause is not valid, SELECT a + b AS c, c + d FROM t GROUP BY c, d. Again, see what Postgres does. This is not valid in calcite and Postgres. We need to test sql-to-rel conversion for some of the cases that do not fail but have different behavior, e.g. large literals, alias same name as columns. Added some sql-to-rel tests for corner cases. e.getCause().getMessage().equals(RESOURCE.columnNotFound(id.names.get(0)).str()) Removed this. if (root.getKind().equals(SqlKind.CUBE) Changed to switch case. Please review the changes.
          Hide
          julianhyde Julian Hyde added a comment -

          Thanks; reviewing now.

          Show
          julianhyde Julian Hyde added a comment - Thanks; reviewing now.
          Hide
          julianhyde Julian Hyde added a comment -

          Very nice work, Rajeshbabu Chintaguntla. I re-organized and extended a little and pushed to https://github.com/julianhyde/calcite/tree/1306-group-alias. Only one very minor quibble: I think "select deptno as dno from emp group by cube(1)" should be valid in the default conformance (because you can cube by expressions). Can you fix that and we're done.

          Show
          julianhyde Julian Hyde added a comment - Very nice work, Rajeshbabu Chintaguntla . I re-organized and extended a little and pushed to https://github.com/julianhyde/calcite/tree/1306-group-alias . Only one very minor quibble: I think "select deptno as dno from emp group by cube(1)" should be valid in the default conformance (because you can cube by expressions). Can you fix that and we're done.
          Hide
          rajeshbabu Rajeshbabu Chintaguntla added a comment -

          Julian Hyde Thanks for modifications. Now rebased my branch to 1306-group-alias and updated the pull request.

          I think "select deptno as dno from emp group by cube(1)" should be valid in the default conformance (because you can cube by expressions).

          This might be big change so as discussed yesterday we can raise another JIRA for this and work separately.

          Show
          rajeshbabu Rajeshbabu Chintaguntla added a comment - Julian Hyde Thanks for modifications. Now rebased my branch to 1306-group-alias and updated the pull request. I think "select deptno as dno from emp group by cube(1)" should be valid in the default conformance (because you can cube by expressions). This might be big change so as discussed yesterday we can raise another JIRA for this and work separately.
          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/e046be23 ; thanks for the PR, Rajeshbabu Chintaguntla !
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Resolved in release 1.13.0 (2017-06-26).

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Resolved in release 1.13.0 (2017-06-26).

            People

            • Assignee:
              rajeshbabu Rajeshbabu Chintaguntla
              Reporter:
              julianhyde Julian Hyde
            • Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development