Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-2129

RelToSqlConverter incorrectly projects aggregate function from sub-query

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.15.0
    • Fix Version/s: None
    • Component/s: core, jdbc-adapter
    • Labels:
      None

      Description

      Starting with this DDL:

      create table friends (
        id int not null,
        friend_id int,
        primary key (id),
        foreign key (friend_id) references friends (id)
      );
      
      insert into friends values
      (210, null),
      (209, 210),
      (202, 209),
      (208, 202),
      (207, 209),
      (203, 207),
      (201, null),
      (204, null),
      (205, null),
      (206, 209);
      

      This query gives the highest id of friends that share friend_id for each friend:

      SELECT id, friend_id,
       (SELECT max(f2.id)
        FROM friends AS f2
        WHERE f2.friend_id = f1.friend_id) AS foaf_id
      FROM friends AS f1
      
      id	friend	foaf
      201		
      202	209	207
      203	207	203
      204		
      205		
      206	209	207
      207	209	207
      208	202	208
      209	210	209
      210
      

      I convert this query to a RelNode and then converted it back to SQL:

      Planner aPlanner = Frameworks.getPlanner(aConfig);
      SqlNode aQuery = aPlanner.parse(theSql);
      aQuery = aPlanner.validate(aQuery);
      RelNode aRelNode = aPlanner.rel(aQuery).project();
      RelToSqlConverter aSqlConverter = new RelToSqlConverter(aSqlDialect);
      SqlNode aSqlNode = aSqlConverter.visitChild(0, aRelNode).asStatement();
      

      This gives the following plan and SQL:

      LogicalProject(id=[$0], friend_id=[$1], foaf_id=[$2])
        LogicalProject(id=[$0], friend_id=[$1], EXPR$0=[$3])
          LogicalJoin(condition=[=($1, $2)], joinType=[left])
            JdbcTableScan(table=[[stardog, friends]])
            LogicalAggregate(group=[{0}], EXPR$0=[MAX($1)])
              LogicalProject(friend_id=[$1], id=[$0])
                LogicalProject(id=[$0], friend_id=[$1])
                  LogicalFilter(condition=[IS NOT NULL($1)])
                    JdbcTableScan(table=[[stardog, friends]])
      
      SELECT `friends`.`id`, `friends`.`friend_id`, MAX(`id`)
      FROM `stardog`.`friends`
      LEFT JOIN (SELECT `friend_id`, MAX(`id`)
      FROM `stardog`.`friends`
      WHERE `friend_id` IS NOT NULL
      GROUP BY `friend_id`) AS `t1` ON `friends`.`friend_id` = `t1`.`friend_id`
      

      This is a bad conversion. The MAX(`id`) should not be repeated in the outer select. PostgreSQL will complain that the aggregating function requires a group by. MySQL returns the max id that has a non-null friend (208), that id's friend (202), and the max id of all rows (210):

      id	friend	MAX(`id`)
      208	202	210
      

      I think the correct SQL should be:

      SELECT `friends`.`id`, `friends`.`friend_id`, foaf_id
      FROM `stardog`.`friends`
      LEFT JOIN (SELECT `friend_id`, MAX(`id`) AS foaf_id
      FROM `stardog`.`friends`
      WHERE `friend_id` IS NOT NULL
      GROUP BY `friend_id`) AS `t1` ON `friends`.`friend_id` = `t1`.`friend_id`
      

      There is code in SqlImplementor and RelToSqlConverter that uses an ordinalMap to track what functions are aliases as what identifiers. When the SQL is generated, the identifier is replaced with the function. If all that code is removed, this works as expected (using EXPR$0 as the alias rather than foaf_id).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                atris Atri Sharma
                Reporter:
                pauljackson123 Paul Jackson
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: