Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-2130

Converting subquery to join is not always giving equivalent behavior

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.15.0
    • None
    • core
    • None

    Description

      Starting with this DDL:

      create table friends (
        id int not null,
        friend_id int,
        primary key (id),
        foreign key (friend_id) references friends (id)
      );
      
      insert into friends values
      (210, null),
      (209, 210),
      (202, 209),
      (208, 202),
      (207, 209),
      (203, 207),
      (201, null),
      (204, null),
      (205, null),
      (206, 209);
      

      This query looks up a row by id and performs another lookup for id by the friend id:

      SELECT id,
       (SELECT id
        FROM friends as f2
        WHERE f2.friend_id = f1.friend_id) AS friend
      FROM friends as f1
      WHERE id = '203'
      
      id	friend_id	friend
      203	207	203
      

      This query only returns a result for rows with unique values for friend_id. Replacing 203 with 202 leads to this error (MySql): Error Code: 1242. Subquery returns more than 1 row

      I converted this query to a RelNode and then converted it back to SQL using the MySQL dialect:

      Planner aPlanner = Frameworks.getPlanner(aConfig);
      SqlNode aQuery = aPlanner.parse(theSql);
      aQuery = aPlanner.validate(aQuery);
      RelNode aRelNode = aPlanner.rel(aQuery).project();
      RelToSqlConverter aSqlConverter = new RelToSqlConverter(aSqlDialect);
      SqlNode aSqlNode = aSqlConverter.visitChild(0, aRelNode).asStatement();
      

      This gives the following plan and SQL:

      LogicalProject(id=[$0], friend_id=[$1], friend=[$2])
        LogicalProject(id=[$0], friend_id=[$1], $f0=[$3])
          LogicalJoin(condition=[=($1, $2)], joinType=[left])
            LogicalFilter(condition=[=($0, '203')])
              JdbcTableScan(table=[[stardog, friends]])
            LogicalAggregate(group=[{0}], agg#0=[SINGLE_VALUE($1)])
              LogicalProject(friend_id=[$1], id=[$0])
                LogicalProject(id=[$0], friend_id=[$1])
                  LogicalFilter(condition=[IS NOT NULL($1)])
                    JdbcTableScan(table=[[stardog, friends]])
      
      SELECT `t`.`id`, `t`.`friend_id`, `t2`.`$f1` AS `$f0`
      FROM (SELECT *
      FROM `stardog`.`friends`
      WHERE `id` = '203') AS `t`
      LEFT JOIN (SELECT `friend_id`, CASE COUNT(`id`) WHEN 0 THEN NULL WHEN 1 THEN `id` ELSE (SELECT NULL
      UNION ALL
      SELECT NULL) END AS `$f1`
      FROM `stardog`.`friends`
      WHERE `friend_id` IS NOT NULL
      GROUP BY `friend_id`) AS `t2` ON `t`.`friend_id` = `t2`.`friend_id`
      

      The MySQL implementation for SINGLE_VALUE is the CASE clause that causes an 1242 error when the id count is greater than 1 by invoking a UNION ALL on two NULL rows. In theory, this should return the ID when it is a unique value and throw an error when there are multiple. Instead, MySQL will return the 1242 error for all values of id, including 203.

      Note, the JOIN subquery works if you add a WHERE clause expression to constrain the value of freind_id:

      SELECT `friend_id`, CASE COUNT(`id`) WHEN 0 THEN NULL WHEN 1 THEN `id` ELSE (SELECT NULL
      UNION ALL
      SELECT NULL) END AS `$f1`
      FROM `stardog`.`friends`
      WHERE `friend_id` IS NOT NULL AND `friend_id` = '207'
      GROUP BY `friend_id`
      

      Substituting friend_id for 207 leads to the 1242 error, as intended.

      This JOIN works on some dialects, but I think it is because different dialects can use different join implementations. If the join performs the JOIN ON SELECT clause without adding a where clause expression to constrain friend_id (collecting all rows with a non-null friend_id), it will encounter this error. Implementations that use a join algorithm that does constrain the friend_id to the desired value will experience the error only when there are multiple rows with the same friend_id, which behaves like the original query.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pauljackson123 Paul Jackson
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: