Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1946

Dialects lacking support for nested aggregations should use sub select instead

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.13.0
    • Fix Version/s: 1.15.0
    • Component/s: jdbc-adapter
    • Labels:
      None

      Description

      Below query, containing 2 SUM aggregation functions and sub-select, gets converted to a SQL that fails when executed in various SQL engines.

      SQL

      SELECT SUM("net_weight1") AS "net_weight_converted"
      FROM (
        SELECT SUM(" net_weight") AS "net_weight1"
        FROM "foodmart"."product"
        GROUP BY "product_id")
      

      Expected:

      SELECT SUM("net_weight1") AS "net_weight_converted"
      FROM (SELECT
              "product_id",
              SUM("net_weight") AS "net_weight1"
            FROM "foodmart"."product"
            GROUP BY "product_id") AS "t0"
      

      Actual:

      SELECT SUM(SUM("net_weight")) AS "net_weight_converted"
      FROM "foodmart"."product"
      GROUP BY "product_id"
      

      Returned errors:

      • MySQL 5.1.73
        Invalid use of group function
      • MemSQL 5.5.8:
        [HY000][1111] Invalid use of group function
      • HP Vertica: 7.2.1-0:
        [42803][2135] [Vertica][VJDBC](2135) ERROR: Aggregate function calls may not be nested java.lang.RuntimeException: com.vertica.support.exceptions.SyntaxErrorException: [Vertica][VJDBC](2135) ERROR: Aggregate function calls may not be nested
      • PostgreSQL 9.6:
        ERROR: aggregate function calls cannot be nested

      Test Case

        @Test public void testEnginesLackingSupportForNestedAggregationsShouldUseSubSelectInstead() {
          final String query = "select\n"
              + "    SUM(\"net_weight1\") as \"net_weight_converted\"\n"
              + "  from ("
              + "    select\n"
              + "       SUM(\"net_weight\") as \"net_weight1\"\n"
              + "    from \"foodmart\".\"product\"\n"
              + "    group by \"product_id\")";
          final String expectedOracle = "SELECT SUM(SUM(\"net_weight\")) \"net_weight_converted\"\n"
              + "FROM \"foodmart\".\"product\"\n"
              + "GROUP BY \"product_id\"";
          final String expectedMySQL = "SELECT SUM(`net_weight1`) AS `net_weight_converted`\n"
              + "FROM (SELECT `product_id`, SUM(`net_weight`) AS `net_weight1`\n"
              + "FROM `foodmart`.`product`\n"
              + "GROUP BY `product_id`) AS `t0`";
          final String expectedVertica = "SELECT SUM(\"net_weight1\") AS \"net_weight_converted\"\n"
              + "FROM (SELECT \"product_id\", SUM(\"net_weight\") AS \"net_weight1\"\n"
              + "FROM \"foodmart\".\"product\"\n"
              + "GROUP BY \"product_id\") AS \"t0\"";
          final String expectedPostgresql = "SELECT SUM(\"net_weight1\") AS \"net_weight_converted\"\n"
              + "FROM (SELECT \"product_id\", SUM(\"net_weight\") AS \"net_weight1\"\n"
              + "FROM \"foodmart\".\"product\"\n"
              + "GROUP BY \"product_id\") AS \"t0\"";
          sql(query)
              .dialect(DatabaseProduct.ORACLE.getDialect())
              .ok(expectedOracle)
              .dialect(DatabaseProduct.MYSQL.getDialect())
              .ok(expectedMySQL)
              .dialect(DatabaseProduct.VERTICA.getDialect())
              .ok(expectedVertica)
              .dialect(DatabaseProduct.POSTGRESQL.getDialect())
              .ok(expectedPostgresql);
       }
      

        Activity

        Hide
        julianhyde Julian Hyde added a comment -

        Resolved in release 1.15.0 (2017-12-11).

        Show
        julianhyde Julian Hyde added a comment - Resolved in release 1.15.0 (2017-12-11).
        Hide
        julianhyde Julian Hyde added a comment -
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/4d0e83e7 . Thanks for the PR, Pawel Ruchaj !
        Hide
        julianhyde Julian Hyde added a comment - - edited

        Reviewing https://github.com/apache/calcite/commit/d31c2a5: looks much better. I will fully review, test and commit after 1.14.

        It looks as if hasNestedAggregations won't handle aggregate functions that are nested in other expressions, e.g. SELECT SUM(2 * COUNT(x)) FROM Emp. It might be helpful to use a visitor, something like Util.OverFinder.

        Show
        julianhyde Julian Hyde added a comment - - edited Reviewing https://github.com/apache/calcite/commit/d31c2a5: looks much better. I will fully review, test and commit after 1.14. It looks as if hasNestedAggregations won't handle aggregate functions that are nested in other expressions, e.g. SELECT SUM(2 * COUNT(x)) FROM Emp . It might be helpful to use a visitor, something like Util.OverFinder.
        Hide
        pawel.ruchaj Pawel Ruchaj added a comment -

        More checks added: https://github.com/apache/calcite/pull/520/files. The tests you suggested also pass Julian Hyde

        Show
        pawel.ruchaj Pawel Ruchaj added a comment - More checks added: https://github.com/apache/calcite/pull/520/files . The tests you suggested also pass Julian Hyde
        Hide
        julianhyde Julian Hyde added a comment -

        It is sufficient to see what is below you in the tree. It should be straightforward change from the current code.

        Show
        julianhyde Julian Hyde added a comment - It is sufficient to see what is below you in the tree. It should be straightforward change from the current code.
        Hide
        pawel.ruchaj Pawel Ruchaj added a comment -

        I think we need a code which checks that:

        • current SQL dialect does not support nested aggregations
        • looks at the current and the next relation, to see if current and next relations are aggregations.
        • if the same field is in use in both aggregations and also used inside aggregation functions
        • only then the new layer is created in such scenarios.

        The problem with above algo is that due to recursive nature of this code I have no access to see next relation.
        WDYT Julian Hyde

        Show
        pawel.ruchaj Pawel Ruchaj added a comment - I think we need a code which checks that: current SQL dialect does not support nested aggregations looks at the current and the next relation, to see if current and next relations are aggregations. if the same field is in use in both aggregations and also used inside aggregation functions only then the new layer is created in such scenarios. The problem with above algo is that due to recursive nature of this code I have no access to see next relation. WDYT Julian Hyde
        Hide
        julianhyde Julian Hyde added a comment -

        Reviewing https://github.com/apache/calcite/pull/520/commits/368dbc1f8ce6170253c8e5a5050185648f5ec073. Your change goes a bit too far. It forces a sub-select where one is not necessary, for example the following fails after your change.

          @Test public void testSelectFilterSubQueryProject() {
            // Even though specified as a sub-query, a single-level query can be sent
            // to JDBC
            String query = "select \"product_class_id\" * 2 as p\n"
                + "from (\n"
                + "  select *\n"
                + "  from \"foodmart\".\"product\"\n"
                + "  where \"product_id\" > 100)";
            final String expected = "SELECT \"product_class_id\" * 2 AS \"P\"\n"
                + "FROM \"foodmart\".\"product\"\n"
                + "WHERE \"product_id\" > 100";
            final String expectedMysql = expected.replace('"', '`');
            sql(query).ok(expected)
                .dialect(DatabaseProduct.POSTGRESQL.getDialect())
                .ok(expected)
                .dialect(DatabaseProduct.MYSQL.getDialect())
                .ok(expectedMysql);
        
            // Do not split a single-level query into a sub-query
            String query2 = "select \"product_class_id\" * 2 as p\n"
                + "from \"foodmart\".\"product\"\n"
                + "  where \"product_id\" > 100";
            sql(query2).ok(expected)
                .dialect(DatabaseProduct.POSTGRESQL.getDialect())
                .ok(expected)
                .dialect(DatabaseProduct.MYSQL.getDialect())
                .ok(expectedMysql);
          }
        

        Also please document supportsNestedAggregations.

        Show
        julianhyde Julian Hyde added a comment - Reviewing https://github.com/apache/calcite/pull/520/commits/368dbc1f8ce6170253c8e5a5050185648f5ec073 . Your change goes a bit too far. It forces a sub-select where one is not necessary, for example the following fails after your change. @Test public void testSelectFilterSubQueryProject() { // Even though specified as a sub-query, a single-level query can be sent // to JDBC String query = "select \" product_class_id\ " * 2 as p\n" + "from (\n" + " select *\n" + " from \" foodmart\ ".\" product\ "\n" + " where \" product_id\ " > 100)" ; final String expected = "SELECT \" product_class_id\ " * 2 AS \" P\ "\n" + "FROM \" foodmart\ ".\" product\ "\n" + "WHERE \" product_id\ " > 100" ; final String expectedMysql = expected.replace('"', '`'); sql(query).ok(expected) .dialect(DatabaseProduct.POSTGRESQL.getDialect()) .ok(expected) .dialect(DatabaseProduct.MYSQL.getDialect()) .ok(expectedMysql); // Do not split a single-level query into a sub-query String query2 = "select \" product_class_id\ " * 2 as p\n" + "from \" foodmart\ ".\" product\ "\n" + " where \" product_id\ " > 100" ; sql(query2).ok(expected) .dialect(DatabaseProduct.POSTGRESQL.getDialect()) .ok(expected) .dialect(DatabaseProduct.MYSQL.getDialect()) .ok(expectedMysql); } Also please document supportsNestedAggregations.
        Hide
        pawel.ruchaj Pawel Ruchaj added a comment -
        Show
        pawel.ruchaj Pawel Ruchaj added a comment - Julian Hyde Please take a look https://github.com/apache/calcite/pull/520
        Hide
        pawel.ruchaj Pawel Ruchaj added a comment - - edited

        Triggering needNew=true in SqlImplementor.builder when building LogicalProject#9 creates required inner select

        LogicalAggregate#10(groupSets={}, NET_WEIGHT=[SUM($0)])
          LogicalProject#9(exps=[$1])
            LogicalAggregate#8(groupSets={1},   NET_WEIGHT=[SUM($1)])
              LogicalProject#7(product_id=[$1], net_weight=[$7])
                JdbcTableScan#6(table=[[foodmart, product]])
        
        Show
        pawel.ruchaj Pawel Ruchaj added a comment - - edited Triggering needNew=true in SqlImplementor.builder when building LogicalProject#9 creates required inner select LogicalAggregate#10(groupSets={}, NET_WEIGHT=[SUM($0)]) LogicalProject#9(exps=[$1]) LogicalAggregate#8(groupSets={1}, NET_WEIGHT=[SUM($1)]) LogicalProject#7(product_id=[$1], net_weight=[$7]) JdbcTableScan#6(table=[[foodmart, product]])
        Hide
        pawel.ruchaj Pawel Ruchaj added a comment -

        It is valid in Oracle: http://rextester.com/LAJS17477

        Show
        pawel.ruchaj Pawel Ruchaj added a comment - It is valid in Oracle: http://rextester.com/LAJS17477
        Hide
        julianhyde Julian Hyde added a comment -

        That SQL is, I believe, valid for Oracle. So, this is a dialect-specific issue.

        Show
        julianhyde Julian Hyde added a comment - That SQL is, I believe, valid for Oracle. So, this is a dialect-specific issue.

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            pawel.ruchaj Pawel Ruchaj
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development