Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.15.0
-
None
-
None
Description
Starting with this DDL:
create table friends ( id int not null, friend_id int, primary key (id), foreign key (friend_id) references friends (id) ); insert into friends values (210, null), (209, 210), (202, 209), (208, 202), (207, 209), (203, 207), (201, null), (204, null), (205, null), (206, 209);
This query looks up a row by id and performs another lookup for id by the friend id:
SELECT id, (SELECT id FROM friends as f2 WHERE f2.friend_id = f1.friend_id) AS friend FROM friends as f1 WHERE id = '203' id friend_id friend 203 207 203
This query only returns a result for rows with unique values for friend_id. Replacing 203 with 202 leads to this error (MySql): Error Code: 1242. Subquery returns more than 1 row
I converted this query to a RelNode and then converted it back to SQL using the MySQL dialect:
Planner aPlanner = Frameworks.getPlanner(aConfig);
SqlNode aQuery = aPlanner.parse(theSql);
aQuery = aPlanner.validate(aQuery);
RelNode aRelNode = aPlanner.rel(aQuery).project();
RelToSqlConverter aSqlConverter = new RelToSqlConverter(aSqlDialect);
SqlNode aSqlNode = aSqlConverter.visitChild(0, aRelNode).asStatement();
This gives the following plan and SQL:
LogicalProject(id=[$0], friend_id=[$1], friend=[$2]) LogicalProject(id=[$0], friend_id=[$1], $f0=[$3]) LogicalJoin(condition=[=($1, $2)], joinType=[left]) LogicalFilter(condition=[=($0, '203')]) JdbcTableScan(table=[[stardog, friends]]) LogicalAggregate(group=[{0}], agg#0=[SINGLE_VALUE($1)]) LogicalProject(friend_id=[$1], id=[$0]) LogicalProject(id=[$0], friend_id=[$1]) LogicalFilter(condition=[IS NOT NULL($1)]) JdbcTableScan(table=[[stardog, friends]]) SELECT `t`.`id`, `t`.`friend_id`, `t2`.`$f1` AS `$f0` FROM (SELECT * FROM `stardog`.`friends` WHERE `id` = '203') AS `t` LEFT JOIN (SELECT `friend_id`, CASE COUNT(`id`) WHEN 0 THEN NULL WHEN 1 THEN `id` ELSE (SELECT NULL UNION ALL SELECT NULL) END AS `$f1` FROM `stardog`.`friends` WHERE `friend_id` IS NOT NULL GROUP BY `friend_id`) AS `t2` ON `t`.`friend_id` = `t2`.`friend_id`
The MySQL implementation for SINGLE_VALUE is the CASE clause that causes an 1242 error when the id count is greater than 1 by invoking a UNION ALL on two NULL rows. In theory, this should return the ID when it is a unique value and throw an error when there are multiple. Instead, MySQL will return the 1242 error for all values of id, including 203.
Note, the JOIN subquery works if you add a WHERE clause expression to constrain the value of freind_id:
SELECT `friend_id`, CASE COUNT(`id`) WHEN 0 THEN NULL WHEN 1 THEN `id` ELSE (SELECT NULL UNION ALL SELECT NULL) END AS `$f1` FROM `stardog`.`friends` WHERE `friend_id` IS NOT NULL AND `friend_id` = '207' GROUP BY `friend_id`
Substituting friend_id for 207 leads to the 1242 error, as intended.
This JOIN works on some dialects, but I think it is because different dialects can use different join implementations. If the join performs the JOIN ON SELECT clause without adding a where clause expression to constrain friend_id (collecting all rows with a non-null friend_id), it will encounter this error. Implementations that use a join algorithm that does constrain the friend_id to the desired value will experience the error only when there are multiple rows with the same friend_id, which behaves like the original query.
Attachments
Issue Links
- is related to
-
CALCITE-2129 RelToSqlConverter incorrectly projects aggregate function from sub-query
- Open
- relates to
-
CALCITE-2131 Provide an option to NOT convert subquery to JOIN
- Open