Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Duplicate
-
None
-
None
-
None
Description
As discussed inĀ CALCITE-35, Calcite cannot parse parenthesized join expressions (such as `select ... from A join (B join C)`).
But as suggested in CALCITE-2152, those expressions can be converted into parenthesized select on the join expression (`select ... from A join (select ... from B join C)`).
However, RelToSqlConverter will convert the RelNode representation of right-associative joins into a parenthesized join expression, which is unparseable.
For example, this RelNode tree
LogicalProject(_T5_ID=[$2], C0_51=[$3], _T1_ID=[$1], C0_53=[$4], _T3_ID=[$0]) LogicalJoin(condition=[=($0, $4)], joinType=[inner]) LogicalProject(_T3_ID=[$0]) LogicalTableScan(table=[[QUERY, T3]], fields=[[0, 1, 2]]) // RHS child of right-associate join LogicalJoin(condition=[=($0, $2)], joinType=[inner]) LogicalProject(_T1_ID=[$0]) LogicalTableScan(table=[[QUERY, T1]], fields=[[0, 1, 2]]) LogicalProject(_T5_ID=[$0], C0_51=[$1], C0_53=[$3]) LogicalTableScan(table=[[QUERY, T5]], fields=[[0, 1, 2, 3]])
creates the following unparseable SQL string
SELECT `t1`.`_t5_id`, `t1`.`c0_51`, `t0`.`_t1_id`, `t1`.`c0_53`, `t`.`_t3_id` FROM (SELECT `id` AS `_T3_ID` FROM `query`.`t3`) AS `t` INNER JOIN ((SELECT `id` AS `_T1_ID` FROM `query`.`t1`) AS `t0` INNER JOIN (SELECT `id` AS `_T5_ID`, `c0_51`, `c0_53` FROM `query`.`t5`) AS `t1` ON `t0`.`_t1_id` = `t1`.`c0_51`) ON `t`.`_t3_id` = `t1`.`c0_53`
This is an issue, because it is very easy to make such trees, and ideally, all SQL strings generated by Calcite would also be parseable by Calcite.
To get around this, we found that the insertion of a projection node between a join and it's RHS child (the problematic join) forces RelToSqlConverter to create a parenthesized select statement, which is parseable, and (at least for us) semantically identical.
For example, this virtually identical tree:
LogicalProject(_T5_ID=[$2], C0_51=[$3], _T1_ID=[$1], C0_53=[$4], _T3_ID=[$0]) LogicalJoin(condition=[=($0, $4)], joinType=[inner]) LogicalProject(_T3_ID=[$0]) LogicalTableScan(table=[[QUERY, T3]], fields=[[0, 1, 2]]) // Inserted projection on top of RHS child LogicalProject(_T1_ID=[$0], _T5_ID=[$1], C0_51=[$2], C0_53=[$3]) // RHS child of right-associate join LogicalJoin(condition=[=($0, $2)], joinType=[inner]) LogicalProject(_T1_ID=[$0]) LogicalTableScan(table=[[QUERY, T1]], fields=[[0, 1, 2]]) LogicalProject(_T5_ID=[$0], C0_51=[$1], C0_53=[$3]) LogicalTableScan(table=[[QUERY, T5]], fields=[[0, 1, 2, 3]])
creates the following parseable SQL string
SELECT `t2`.`_t5_id`, `t2`.`c0_51`, `t2`.`_t1_id`, `t2`.`c0_53`, `t`.`_t3_id` FROM (SELECT `id` AS `_T3_ID` FROM `query`.`t3`) AS `t` INNER JOIN (SELECT * FROM (SELECT `id` AS `_T1_ID` FROM `query`.`t1`) AS `t0` INNER JOIN (SELECT `id` AS `_T5_ID`, `c0_51`, `c0_53` FROM `query`.`t5`) AS `t1` ON `t0`.`_t1_id` = `t1`.`c0_51`) AS `t2` ON `t`.`_t3_id` = `t2`.`c0_53`
We solved this our RelToSqlConverter extending class by cloning the parent join and inserting a projection between it and an RHS join:
override def visit(join: Join): Result = { if (join.getRight.isInstanceOf[Join]) { super.visit( join.copy( join.getTraitSet, join.getCondition, join.getLeft, new LogicalProject( join.getCluster, join.getTraitSet, join.getHints, join.getRight, // Construct projection of all expressions from the rhs join join.getRight.getRowType.getFieldList.asScala.map(field => new RexInputRef(field.getIndex, field.getType)).asJava, join.getRight.getRowType ), join.getJoinType, join.isSemiJoinDone ) ) } else { super.visit(join) } }
(we tried to do it "correctly" by constructing a proper Result object, but found it too difficult. Hopefully someone else can figure out how do it the right way).
Attachments
Issue Links
- duplicates
-
CALCITE-2152 SQL parser unable to parse SQL with nested joins produced by RelToSqlConverter
- Open