Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-4852

RelToSqlConverter creates unparseable SQL string from right associative multi-way join.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Duplicate
    • None
    • None
    • core
    • None

    Description

      As discussed inĀ CALCITE-35, Calcite cannot parse parenthesized join expressions (such as `select ... from A join (B join C)`).
      But as suggested in CALCITE-2152, those expressions can be converted into parenthesized select on the join expression (`select ... from A join (select ... from B join C)`).

      However, RelToSqlConverter will convert the RelNode representation of right-associative joins into a parenthesized join expression, which is unparseable.

      For example, this RelNode tree

      LogicalProject(_T5_ID=[$2], C0_51=[$3], _T1_ID=[$1], C0_53=[$4], _T3_ID=[$0])
        LogicalJoin(condition=[=($0, $4)], joinType=[inner])
          LogicalProject(_T3_ID=[$0])
            LogicalTableScan(table=[[QUERY, T3]], fields=[[0, 1, 2]])
            // RHS child of right-associate join
            LogicalJoin(condition=[=($0, $2)], joinType=[inner])
              LogicalProject(_T1_ID=[$0])
                LogicalTableScan(table=[[QUERY, T1]], fields=[[0, 1, 2]])
              LogicalProject(_T5_ID=[$0], C0_51=[$1], C0_53=[$3])
                LogicalTableScan(table=[[QUERY, T5]], fields=[[0, 1, 2, 3]])
      

      creates the following unparseable SQL string

      SELECT `t1`.`_t5_id`,
             `t1`.`c0_51`,
             `t0`.`_t1_id`,
             `t1`.`c0_53`,
             `t`.`_t3_id`
      FROM   (SELECT `id` AS `_T3_ID`
              FROM   `query`.`t3`) AS `t`
             INNER JOIN ((SELECT `id` AS `_T1_ID`
                         FROM   `query`.`t1`) AS `t0`
                         INNER JOIN (SELECT `id` AS `_T5_ID`,
                                            `c0_51`,
                                            `c0_53`
                                     FROM   `query`.`t5`) AS `t1`
                                 ON `t0`.`_t1_id` = `t1`.`c0_51`)
                     ON `t`.`_t3_id` = `t1`.`c0_53`
      

      This is an issue, because it is very easy to make such trees, and ideally, all SQL strings generated by Calcite would also be parseable by Calcite.

      To get around this, we found that the insertion of a projection node between a join and it's RHS child (the problematic join) forces RelToSqlConverter to create a parenthesized select statement, which is parseable, and (at least for us) semantically identical.

      For example, this virtually identical tree:

      LogicalProject(_T5_ID=[$2], C0_51=[$3], _T1_ID=[$1], C0_53=[$4], _T3_ID=[$0])
        LogicalJoin(condition=[=($0, $4)], joinType=[inner])
          LogicalProject(_T3_ID=[$0])
            LogicalTableScan(table=[[QUERY, T3]], fields=[[0, 1, 2]])
          // Inserted projection on top of RHS child
          LogicalProject(_T1_ID=[$0], _T5_ID=[$1], C0_51=[$2], C0_53=[$3])
            // RHS child of right-associate join
            LogicalJoin(condition=[=($0, $2)], joinType=[inner])
              LogicalProject(_T1_ID=[$0])
                LogicalTableScan(table=[[QUERY, T1]], fields=[[0, 1, 2]])
              LogicalProject(_T5_ID=[$0], C0_51=[$1], C0_53=[$3])
                LogicalTableScan(table=[[QUERY, T5]], fields=[[0, 1, 2, 3]])
      

      creates the following parseable SQL string

      SELECT `t2`.`_t5_id`,
             `t2`.`c0_51`,
             `t2`.`_t1_id`,
             `t2`.`c0_53`,
             `t`.`_t3_id`
      FROM   (SELECT `id` AS `_T3_ID`
              FROM   `query`.`t3`) AS `t`
             INNER JOIN (SELECT *
                         FROM   (SELECT `id` AS `_T1_ID`
                                 FROM   `query`.`t1`) AS `t0`
                                INNER JOIN (SELECT `id` AS `_T5_ID`,
                                                   `c0_51`,
                                                   `c0_53`
                                            FROM   `query`.`t5`) AS `t1`
                                        ON `t0`.`_t1_id` = `t1`.`c0_51`) AS `t2`
                     ON `t`.`_t3_id` = `t2`.`c0_53`
      

      We solved this our RelToSqlConverter extending class by cloning the parent join and inserting a projection between it and an RHS join:

      override def visit(join: Join): Result = {
          if (join.getRight.isInstanceOf[Join]) {
            super.visit(
              join.copy(
                join.getTraitSet,
                join.getCondition,
                join.getLeft,
                new LogicalProject(
                  join.getCluster,
                  join.getTraitSet,
                  join.getHints,
                  join.getRight,
                  // Construct projection of all expressions from the rhs join
                  join.getRight.getRowType.getFieldList.asScala.map(field => new RexInputRef(field.getIndex, field.getType)).asJava,
                  join.getRight.getRowType
                ),
                join.getJoinType,
                join.isSemiJoinDone
              )
            )
          } else {
            super.visit(join)
          }
        }
      

      (we tried to do it "correctly" by constructing a proper Result object, but found it too difficult. Hopefully someone else can figure out how do it the right way).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ian.bertolacci Ian Bertolacci
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: