[CALCITE-2129] RelToSqlConverter incorrectly projects aggregate function from sub-query - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.15.0
Fix Version/s: None
Component/s: core, jdbc-adapter
Labels:
None

Description

Starting with this DDL:

create table friends (
  id int not null,
  friend_id int,
  primary key (id),
  foreign key (friend_id) references friends (id)
);

insert into friends values
(210, null),
(209, 210),
(202, 209),
(208, 202),
(207, 209),
(203, 207),
(201, null),
(204, null),
(205, null),
(206, 209);

This query gives the highest id of friends that share friend_id for each friend:

SELECT id, friend_id,
 (SELECT max(f2.id)
  FROM friends AS f2
  WHERE f2.friend_id = f1.friend_id) AS foaf_id
FROM friends AS f1

id	friend	foaf
201		
202	209	207
203	207	203
204		
205		
206	209	207
207	209	207
208	202	208
209	210	209
210

I convert this query to a RelNode and then converted it back to SQL:

Planner aPlanner = Frameworks.getPlanner(aConfig);
SqlNode aQuery = aPlanner.parse(theSql);
aQuery = aPlanner.validate(aQuery);
RelNode aRelNode = aPlanner.rel(aQuery).project();
RelToSqlConverter aSqlConverter = new RelToSqlConverter(aSqlDialect);
SqlNode aSqlNode = aSqlConverter.visitChild(0, aRelNode).asStatement();

This gives the following plan and SQL:

LogicalProject(id=[$0], friend_id=[$1], foaf_id=[$2])
  LogicalProject(id=[$0], friend_id=[$1], EXPR$0=[$3])
    LogicalJoin(condition=[=($1, $2)], joinType=[left])
      JdbcTableScan(table=[[stardog, friends]])
      LogicalAggregate(group=[{0}], EXPR$0=[MAX($1)])
        LogicalProject(friend_id=[$1], id=[$0])
          LogicalProject(id=[$0], friend_id=[$1])
            LogicalFilter(condition=[IS NOT NULL($1)])
              JdbcTableScan(table=[[stardog, friends]])

SELECT `friends`.`id`, `friends`.`friend_id`, MAX(`id`)
FROM `stardog`.`friends`
LEFT JOIN (SELECT `friend_id`, MAX(`id`)
FROM `stardog`.`friends`
WHERE `friend_id` IS NOT NULL
GROUP BY `friend_id`) AS `t1` ON `friends`.`friend_id` = `t1`.`friend_id`

This is a bad conversion. The MAX(`id`) should not be repeated in the outer select. PostgreSQL will complain that the aggregating function requires a group by. MySQL returns the max id that has a non-null friend (208), that id's friend (202), and the max id of all rows (210):

id	friend	MAX(`id`)
208	202	210

I think the correct SQL should be:

SELECT `friends`.`id`, `friends`.`friend_id`, foaf_id
FROM `stardog`.`friends`
LEFT JOIN (SELECT `friend_id`, MAX(`id`) AS foaf_id
FROM `stardog`.`friends`
WHERE `friend_id` IS NOT NULL
GROUP BY `friend_id`) AS `t1` ON `friends`.`friend_id` = `t1`.`friend_id`

There is code in SqlImplementor and RelToSqlConverter that uses an ordinalMap to track what functions are aliases as what identifiers. When the SQL is generated, the identifier is replaced with the function. If all that code is removed, this works as expected (using EXPR$0 as the alias rather than foaf_id).

Attachments

Issue Links

relates to

CALCITE-2130 Converting subquery to join is not always giving equivalent behavior

Open

CALCITE-2131 Provide an option to NOT convert subquery to JOIN

Open

Activity

People

Assignee:: Atri Sharma

Reporter:: Paul Jackson

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 10/Jan/18 17:56

Updated:: 22/May/18 17:29