Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.9.0
    • Component/s: None
    • Labels:
      None

      Description

      Regression from CALCITE-750 Allow nested window aggregates.
      Calcite allows illegal queries instead of raising an appropriate error. When executing the following query calcite does not raise the following error.

      select avg(sum(b)) over (partition by b) from t1;
      ERROR:  Expression 'b' is not being grouped
      

        Issue Links

          Activity

          Hide
          gparai Gautam Kumar Parai added a comment -

          I have created a pull request (https://github.com/apache/calcite/pull/258). Aman Sinha Julian Hyde can you please take a look? Thanks!

          Show
          gparai Gautam Kumar Parai added a comment - I have created a pull request ( https://github.com/apache/calcite/pull/258 ). Aman Sinha Julian Hyde can you please take a look? Thanks!
          Hide
          julianhyde Julian Hyde added a comment -

          Looks basically good. You have missed a couple of use cases.

          1. Remember that a query can be aggregating even if it doesn't have a GROUP BY clause: it could have a HAVING clause or an aggregate in the SELECT clause. Need tests for these cases, and use !SqlValidator.isAggregate(select) rather than select.getGroupBy() == null.

          2. You are allowed to use an expression based on group by columns. For example, select avg(sum(sal)) over (partition by empno) from emp group by deptno + 1 is legal.

          Also, remove carets from valid SQL queries in test cases, and please remember to use continuation indent 4, not 8.

          Show
          julianhyde Julian Hyde added a comment - Looks basically good. You have missed a couple of use cases. 1. Remember that a query can be aggregating even if it doesn't have a GROUP BY clause: it could have a HAVING clause or an aggregate in the SELECT clause. Need tests for these cases, and use !SqlValidator.isAggregate(select) rather than select.getGroupBy() == null . 2. You are allowed to use an expression based on group by columns. For example, select avg(sum(sal)) over (partition by empno) from emp group by deptno + 1 is legal. Also, remove carets from valid SQL queries in test cases, and please remember to use continuation indent 4, not 8.
          Hide
          julianhyde Julian Hyde added a comment -

          I reviewed the patch again. Looks good. I added some more test cases, and they passed.

          However I realized there might be a much simpler fix. Take a look at the lines in SqlValidatorImpl.registerQuery:

                if (isAggregate(select)) {
                  aggScope =
                      new AggregatingSelectScope(selectScope, select, false);
                  selectScopes.put(select, aggScope);
                } else {
                  selectScopes.put(select, selectScope);
                }
          

          I have a feeling that isAggregate(select) is returning false for, say, select avg(sum(sal)) over (partition by empno) from emp, but it should be returning true. And if it did, your big new else block is not necessary.

          Can you please take a look at fixing isAggregate?

          By the way, here are the new tests:

              sql("select avg(sum(sal)) over ^(partition by empno)^ from emp")
                  .fails("Expression 'EMP.EMPNO' is not being grouped");
          
              sql("select avg(sum(sal)) over (partition by ^empno^)\n"
                  + "from emp group by deptno")
                  .fails("Expression 'EMPNO' is not being grouped");
          
              sql("select avg(sum(sal)) over (partition by deptno order by ^empno^)\n"
                  + "from emp group by deptno")
                  .fails("Expression 'EMPNO' is not being grouped");
          
              // expression is OK
              sql("select avg(sum(sal)) over (partition by 10 - deptno\n"
                  + "   order by deptno / 2 desc)\n"
                  + "from emp group by deptno").ok();
          
              // expression involving non-GROUP column is not OK
              sql("select avg(sum(sal)) over (partition by deptno + ^empno^)\n"
                  + "from emp group by deptno")
                  .fails("Expression 'EMPNO' is not being grouped");
          
              sql("select avg(sum(sal)) over (partition by empno + deptno)\n"
                  + "from emp group by empno + deptno").ok();
          
              sql("select avg(sum(sal)) over (partition by empno + deptno + 1)\n"
                  + "from emp group by empno + deptno").ok();
          
              sql("select avg(sum(sal)) over (partition by ^deptno^ + 1)\n"
                  + "from emp group by empno + deptno")
                  .fails("Expression 'DEPTNO' is not being grouped");
          
              sql("select avg(empno + deptno) over (partition by empno + deptno + 1),\n"
                  + "  count(empno + deptno) over (partition by empno + deptno + 1)\n"
                  + "from emp group by empno + deptno").ok();
          
          Show
          julianhyde Julian Hyde added a comment - I reviewed the patch again. Looks good. I added some more test cases, and they passed. However I realized there might be a much simpler fix. Take a look at the lines in SqlValidatorImpl.registerQuery: if (isAggregate(select)) { aggScope = new AggregatingSelectScope(selectScope, select, false ); selectScopes.put(select, aggScope); } else { selectScopes.put(select, selectScope); } I have a feeling that isAggregate(select) is returning false for, say, select avg(sum(sal)) over (partition by empno) from emp , but it should be returning true. And if it did, your big new else block is not necessary. Can you please take a look at fixing isAggregate ? By the way, here are the new tests: sql( "select avg(sum(sal)) over ^(partition by empno)^ from emp" ) .fails( "Expression 'EMP.EMPNO' is not being grouped" ); sql( "select avg(sum(sal)) over (partition by ^empno^)\n" + "from emp group by deptno" ) .fails( "Expression 'EMPNO' is not being grouped" ); sql( "select avg(sum(sal)) over (partition by deptno order by ^empno^)\n" + "from emp group by deptno" ) .fails( "Expression 'EMPNO' is not being grouped" ); // expression is OK sql( "select avg(sum(sal)) over (partition by 10 - deptno\n" + " order by deptno / 2 desc)\n" + "from emp group by deptno" ).ok(); // expression involving non-GROUP column is not OK sql( "select avg(sum(sal)) over (partition by deptno + ^empno^)\n" + "from emp group by deptno" ) .fails( "Expression 'EMPNO' is not being grouped" ); sql( "select avg(sum(sal)) over (partition by empno + deptno)\n" + "from emp group by empno + deptno" ).ok(); sql( "select avg(sum(sal)) over (partition by empno + deptno + 1)\n" + "from emp group by empno + deptno" ).ok(); sql( "select avg(sum(sal)) over (partition by ^deptno^ + 1)\n" + "from emp group by empno + deptno" ) .fails( "Expression 'DEPTNO' is not being grouped" ); sql( "select avg(empno + deptno) over (partition by empno + deptno + 1),\n" + " count(empno + deptno) over (partition by empno + deptno + 1)\n" + "from emp group by empno + deptno" ).ok();
          Hide
          gparai Gautam Kumar Parai added a comment -

          Thanks for the testcases Julian Hyde. I have updated the pull request (https://github.com/apache/calcite/pull/258). Aman Sinha Julian Hyde can you please take a look? Thanks!

          Show
          gparai Gautam Kumar Parai added a comment - Thanks for the testcases Julian Hyde . I have updated the pull request ( https://github.com/apache/calcite/pull/258 ). Aman Sinha Julian Hyde can you please take a look? Thanks!
          Hide
          gparai Gautam Kumar Parai added a comment - - edited

          Julian Hyde I found a couple of existing issues while debugging this issue.

          1. We do not seem to verify for window clause correctness in aggregating scope.

          SELECT sum(empno), max(empno) OVER (order by deptno) 
          FROM emp 
          GROUP BY empno 
          gives error: Expression 'DEPTNO' is not being grouped
          
          SELECT sum(empno), max(empno) OVER w 
          FROM emp 
          GROUP BY empno 
          WINDOW w as (order by deptno)
          gives no error:
          

          Although, the new code fixes the issue.

          2. We do not seem to verify for correctness of rank() (maybe other operators?) prior to verifying aggregating scope checks. We incorrectly think rank() without over() is an aggregate and proceed to do agg scope checks for the rest. We should fail first at "OVER clause is necessary for window functions"
          This was referred to by Aman Sinha. I had to modify the existing testcase but it should not have required to do so.

          Do you think we should file separate JIRAs for the two issues?

          Show
          gparai Gautam Kumar Parai added a comment - - edited Julian Hyde I found a couple of existing issues while debugging this issue. 1. We do not seem to verify for window clause correctness in aggregating scope. SELECT sum(empno), max(empno) OVER (order by deptno) FROM emp GROUP BY empno gives error: Expression 'DEPTNO' is not being grouped SELECT sum(empno), max(empno) OVER w FROM emp GROUP BY empno WINDOW w as (order by deptno) gives no error: Although, the new code fixes the issue. 2. We do not seem to verify for correctness of rank() (maybe other operators?) prior to verifying aggregating scope checks. We incorrectly think rank() without over() is an aggregate and proceed to do agg scope checks for the rest. We should fail first at "OVER clause is necessary for window functions" This was referred to by Aman Sinha . I had to modify the existing testcase but it should not have required to do so. Do you think we should file separate JIRAs for the two issues?
          Hide
          julianhyde Julian Hyde added a comment - - edited

          It's really all part of the same issue: making windowed aggregates (OVER) work in aggregate queries (those with GROUP BY, HAVING or non-OVER aggregates). I don't particularly mind whether it's a new JIRA case or just this one, but it all needs to be fixed.

          Also, winSql("select count( * ) over () from emp group by deptno").ok(); fails with AssertionError: star should have been expanded. I don't remember whether that has a JIRA case, but it's the same issue.

          Show
          julianhyde Julian Hyde added a comment - - edited It's really all part of the same issue: making windowed aggregates (OVER) work in aggregate queries (those with GROUP BY, HAVING or non-OVER aggregates). I don't particularly mind whether it's a new JIRA case or just this one, but it all needs to be fixed. Also, winSql("select count( * ) over () from emp group by deptno").ok(); fails with AssertionError: star should have been expanded . I don't remember whether that has a JIRA case, but it's the same issue.
          Hide
          gparai Gautam Kumar Parai added a comment - - edited

          Julian Hyde I agree these are related to the same issue you mentioned above. I have created CALCITE-1340 and assigned it to myself to address these issues.

          However, the current JIRA is targeted towards nested window aggregate issues which were caused by CALCITE-750 (Support for nested window aggregates). The current PR fixes the issues which are in the scope of this JIRA. Could you please review the PR? Thanks!

          Show
          gparai Gautam Kumar Parai added a comment - - edited Julian Hyde I agree these are related to the same issue you mentioned above. I have created CALCITE-1340 and assigned it to myself to address these issues. However, the current JIRA is targeted towards nested window aggregate issues which were caused by CALCITE-750 (Support for nested window aggregates). The current PR fixes the issues which are in the scope of this JIRA. Could you please review the PR? Thanks!
          Hide
          julianhyde Julian Hyde added a comment -

          The root cause was that CALCITE-750 was incomplete, because it was inadequately tested. Let's not just chip away at the problem. I'll commit when we have all the issues solved.

          Show
          julianhyde Julian Hyde added a comment - The root cause was that CALCITE-750 was incomplete, because it was inadequately tested. Let's not just chip away at the problem. I'll commit when we have all the issues solved.
          Hide
          gparai Gautam Kumar Parai added a comment - - edited

          Julian Hyde I have updated the pull request (https://github.com/apache/calcite/pull/258) which also address the issues in CALCITE-1340. Can you please take a look? Thanks!

          Show
          gparai Gautam Kumar Parai added a comment - - edited Julian Hyde I have updated the pull request ( https://github.com/apache/calcite/pull/258 ) which also address the issues in CALCITE-1340 . Can you please take a look? Thanks!
          Hide
          julianhyde Julian Hyde added a comment -

          Looks good. I fixed up, by not regarding RANK as an aggregate function (as I suggested in CALCITE-1340), and it simplified a lot. Can you please review https://github.com/julianhyde/calcite/commit/59a3e402602673685fc7279ab82c0b1e186790a7 (currently on my master branch).

          Show
          julianhyde Julian Hyde added a comment - Looks good. I fixed up, by not regarding RANK as an aggregate function (as I suggested in CALCITE-1340 ), and it simplified a lot. Can you please review https://github.com/julianhyde/calcite/commit/59a3e402602673685fc7279ab82c0b1e186790a7 (currently on my master branch).
          Hide
          gparai Gautam Kumar Parai added a comment -

          Thanks for updating the PR Julian Hyde. I reviewed the PR - please take a look.

          Show
          gparai Gautam Kumar Parai added a comment - Thanks for updating the PR Julian Hyde . I reviewed the PR - please take a look.
          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/69839c37 ; thanks for the PR, Gautam Kumar Parai !
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Resolved in release 1.9.0 (2016-09-22)

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Resolved in release 1.9.0 (2016-09-22)

            People

            • Assignee:
              gparai Gautam Kumar Parai
              Reporter:
              gparai Gautam Kumar Parai
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development