Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-864

Correlation variable has incorrect row type if it is populated by right side of a Join

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.7.0
    • Component/s: None
    • Labels:
      None

      Description

      Incorrect rowType of correlate variable when correlated sub-query has its left relation as a join.

      DDL

      create table orders (
          order_id varchar(15) not null primary key,
          customer_id\ varchar(10),
          item_id varchar(10),
          price integer,
          quantity integer,
          date timestamp,
          the_year integer);
      
      create table items (
          item_id varchar(10) not null primary key,
          name varchar,
          price integer,
          discount1 integer,
          discount2 integer,
          supplier_id varchar(10),
          description varchar);
      

      Query:

      select order_id 
      from orders o join items i on o.item_id = i.item_id 
      where quantity = (
          select max(quantity) 
          from orders o2 join items i2 on o2.item_id = i2.item_id 
          where i.supplier_id = i2.supplier_id
      )
      

      While executing this query with option "forceDecorrelate=false", I got a RuntimeException:

      java.lang.RuntimeException: java.sql.SQLException: error while executing SQL "select "order_id" from "Join"."OrderTable" o JOIN "Join"."ItemTable" i on o."item_id" = i."item_id" where quantity = (select max(quantity) from "Join"."OrderTable" o2 JOIN "Join"."ItemTable" i2 on o2."item_id" = i2."item_id" where i."supplier_id" = i2."supplier_id")": org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR for org.apache.phoenix.expression.CorrelateVariableFieldAccessExpression@462bd95e = supplier_id
      	at org.apache.phoenix.calcite.CalciteIT$Sql.resultIs(CalciteIT.java:187)
      	at org.apache.phoenix.calcite.CalciteIT.testCorrelate(CalciteIT.java:1155)
      Caused by: java.sql.SQLException: error while executing SQL "select "order_id" from "Join"."OrderTable" o JOIN "Join"."ItemTable" i on o."item_id" = i."item_id" where quantity = (select max(quantity) from "Join"."OrderTable" o2 JOIN "Join"."ItemTable" i2 on o2."item_id" = i2."item_id" where i."supplier_id" = i2."supplier_id")": org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR for org.apache.phoenix.expression.CorrelateVariableFieldAccessExpression@462bd95e = supplier_id
      	at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
      	at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:112)
      	at org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:130)
      	at org.apache.phoenix.calcite.CalciteIT$Sql.resultIs(CalciteIT.java:174)
      	... 28 more
      Caused by: java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR for org.apache.phoenix.expression.CorrelateVariableFieldAccessExpression@462bd95e = supplier_id
      	at org.apache.phoenix.calcite.CalciteUtils$3.newExpression(CalciteUtils.java:145)
      	at org.apache.phoenix.calcite.CalciteUtils.toExpression(CalciteUtils.java:807)
      	at org.apache.phoenix.calcite.rel.PhoenixTableScan.implement(PhoenixTableScan.java:187)
      	at org.apache.phoenix.calcite.rel.PhoenixRelImplementorImpl.visitInput(PhoenixRelImplementorImpl.java:50)
      	at org.apache.phoenix.calcite.rel.PhoenixToClientConverter.implement(PhoenixToClientConverter.java:42)
      	at org.apache.phoenix.calcite.rel.PhoenixRelImplementorImpl.visitInput(PhoenixRelImplementorImpl.java:50)
      	at org.apache.phoenix.calcite.rel.PhoenixAbstractJoin.implementInput(PhoenixAbstractJoin.java:49)
      	at org.apache.phoenix.calcite.rel.PhoenixServerJoin.implement(PhoenixServerJoin.java:113)
      	at org.apache.phoenix.calcite.rel.PhoenixRelImplementorImpl.visitInput(PhoenixRelImplementorImpl.java:50)
      	at org.apache.phoenix.calcite.rel.PhoenixServerAggregate.implement(PhoenixServerAggregate.java:57)
      	at org.apache.phoenix.calcite.rel.PhoenixRelImplementorImpl.visitInput(PhoenixRelImplementorImpl.java:50)
      	at org.apache.phoenix.calcite.rel.PhoenixCorrelate.implement(PhoenixCorrelate.java:78)
      	at org.apache.phoenix.calcite.rel.PhoenixRelImplementorImpl.visitInput(PhoenixRelImplementorImpl.java:50)
      	at org.apache.phoenix.calcite.rel.PhoenixFilter.implement(PhoenixFilter.java:59)
      	at org.apache.phoenix.calcite.rel.PhoenixRelImplementorImpl.visitInput(PhoenixRelImplementorImpl.java:50)
      	at org.apache.phoenix.calcite.rel.PhoenixClientProject.implement(PhoenixClientProject.java:59)
      	at org.apache.phoenix.calcite.rel.PhoenixRelImplementorImpl.visitInput(PhoenixRelImplementorImpl.java:50)
      	at org.apache.phoenix.calcite.rel.PhoenixToEnumerableConverter.makePlan(PhoenixToEnumerableConverter.java:88)
      	at org.apache.phoenix.calcite.rel.PhoenixToEnumerableConverter.implement(PhoenixToEnumerableConverter.java:69)
      	at org.apache.calcite.adapter.enumerable.EnumerableRelImplementor.implementRoot(EnumerableRelImplementor.java:99)
      	at org.apache.calcite.adapter.enumerable.EnumerableInterpretable.toBindable(EnumerableInterpretable.java:92)
      	at org.apache.calcite.prepare.CalcitePrepareImpl$CalcitePreparingStmt.implement(CalcitePrepareImpl.java:1050)
      	at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:293)
      	at org.apache.calcite.prepare.Prepare.prepareSql(Prepare.java:188)
      	at org.apache.calcite.prepare.CalcitePrepareImpl.prepare2_(CalcitePrepareImpl.java:671)
      	at org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:572)
      	at org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:541)
      	at org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:173)
      	at org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:561)
      	at org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:477)
      	at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:109)
      	... 30 more
      Caused by: org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR for org.apache.phoenix.expression.CorrelateVariableFieldAccessExpression@462bd95e = supplier_id
      	at org.apache.phoenix.schema.TypeMismatchException.newException(TypeMismatchException.java:53)
      	at org.apache.phoenix.expression.ComparisonExpression.create(ComparisonExpression.java:133)
      	at org.apache.phoenix.calcite.CalciteUtils$3.newExpression(CalciteUtils.java:143)
      	... 60 more
      

      The plan was:

      PhoenixToEnumerableConverter
        PhoenixClientProject(order_id=[$0])
          PhoenixFilter(condition=[=($4, $14)])
            PhoenixCorrelate(correlation=[$cor0], joinType=[LEFT], requiredColumns=[{12}])
              PhoenixClientProject(order_id=[$7], CUSTOMER_ID=[$8], ITEM_ID=[$9], PRICE=[$10], QUANTITY=[$11], DATE=[$12], THE_YEAR=[$13], ITEM_ID0=[$0], NAME=[$1], PRICE0=[$2], DISCOUNT1=[$3], DISCOUNT2=[$4], SUPPLIER_ID=[$5], DESCRIPTION=[$6])
                PhoenixToClientConverter
                  PhoenixServerJoin(condition=[=($9, $0)], joinType=[inner])
                    PhoenixTableScan(table=[[phoenix, Join, ItemTable]])
                    PhoenixToClientConverter
                      PhoenixTableScan(table=[[phoenix, Join, OrderTable]])
              PhoenixServerAggregate(group=[{}], EXPR$0=[MAX($4)])
                PhoenixServerJoin(condition=[=($2, $7)], joinType=[inner])
                  PhoenixTableScan(table=[[phoenix, Join, OrderTable]])
                  PhoenixToClientConverter
                    PhoenixTableScan(table=[[phoenix, Join, ItemTable]], filter=[=($cor0.SUPPLIER_ID, $5)])
      

      Debugging this case, I found that $cor0.SUPPLIER_ID was referencing to the 6th (index=5) of the correlating variable. And meanwhile the correlating variable has the same RelDataType as table "items" but not the left relation ("orders" join "items") of the rel Correlate.

      After switching the join tables of the left relation, the query gave the right result. I think the reason was that the leading part (all that matters to RexFieldAccess "$cor0.SUPPLIER_ID") was now the same between table "items" and the left relation ("items" join "orders") of the rel Correlate.

      select order_id 
      from items i join orders o on o.item_id = i.item_id 
      where quantity = (
          select max(quantity) 
          from orders o2 join items i2 on o2.item_id = i2.item_id 
          where i.supplier_id = i2.supplier_id
      )
      

      With de-correlation, the first query also failed because the join condition was referencing to the wrong position. The second query succeed with de-correlation too.

        Issue Links

          Activity

          Hide
          maryannxue Maryann Xue added a comment -

          The code related to this issue:

              RexNode lookupExp(SqlQualified qualified) {
                if (nameToNodeMap != null && qualified.prefixLength == 1) {
                  RexNode node = nameToNodeMap.get(qualified.identifier.names.get(0));
                  if (node == null) {
                    throw Util.newInternal("Unknown identifier '" + qualified.identifier
                        + "' encountered while expanding expression");
                  }
                  return node;
                }
                int[] offsets = {-1};
                final SqlValidatorScope[] ancestorScopes = {null};
                SqlValidatorNamespace foundNs =
                    scope.resolve(qualified.prefix(), ancestorScopes, offsets);
                if (foundNs == null) {
                  return null;
                }
          
                // Found in current query's from list.  Find which from item.
                // We assume that the order of the from clause items has been
                // preserved.
                SqlValidatorScope ancestorScope = ancestorScopes[0];
                boolean isParent = ancestorScope != scope;
                if ((inputs != null) && !isParent) {
                  int offset = offsets[0];
                  final LookupContext rels =
                      new LookupContext(this, inputs, systemFieldList.size());
                  return lookup(offset, rels);
                } else {
                  // We're referencing a relational expression which has not been
                  // converted yet. This occurs when from items are correlated,
                  // e.g. "select from emp as emp join emp.getDepts() as dept".
                  // Create a temporary expression.
                  assert isParent;
                  DeferredLookup lookup =
                      new DeferredLookup(this, qualified.identifier.names.get(0));
                  final CorrelationId correlName = cluster.createCorrel();
                  mapCorrelToDeferred.put(correlName, lookup);
                  final RelDataType rowType = foundNs.getRowType();
                  return rexBuilder.makeCorrel(rowType, correlName);
                }
              }
          

          foundNs is resolved as the table where the column comes from instead of the join relation.

          After CALCITE-816, this would cause RelFieldTrimmer fail to adjust correlate variables according to the trimmed result.

          Show
          maryannxue Maryann Xue added a comment - The code related to this issue: RexNode lookupExp(SqlQualified qualified) { if (nameToNodeMap != null && qualified.prefixLength == 1) { RexNode node = nameToNodeMap.get(qualified.identifier.names.get(0)); if (node == null ) { throw Util.newInternal( "Unknown identifier '" + qualified.identifier + "' encountered while expanding expression" ); } return node; } int [] offsets = {-1}; final SqlValidatorScope[] ancestorScopes = { null }; SqlValidatorNamespace foundNs = scope.resolve(qualified.prefix(), ancestorScopes, offsets); if (foundNs == null ) { return null ; } // Found in current query's from list. Find which from item. // We assume that the order of the from clause items has been // preserved. SqlValidatorScope ancestorScope = ancestorScopes[0]; boolean isParent = ancestorScope != scope; if ((inputs != null ) && !isParent) { int offset = offsets[0]; final LookupContext rels = new LookupContext( this , inputs, systemFieldList.size()); return lookup(offset, rels); } else { // We're referencing a relational expression which has not been // converted yet. This occurs when from items are correlated, // e.g. "select from emp as emp join emp.getDepts() as dept" . // Create a temporary expression. assert isParent; DeferredLookup lookup = new DeferredLookup( this , qualified.identifier.names.get(0)); final CorrelationId correlName = cluster.createCorrel(); mapCorrelToDeferred.put(correlName, lookup); final RelDataType rowType = foundNs.getRowType(); return rexBuilder.makeCorrel(rowType, correlName); } } foundNs is resolved as the table where the column comes from instead of the join relation. After CALCITE-816 , this would cause RelFieldTrimmer fail to adjust correlate variables according to the trimmed result.
          Hide
          julianhyde Julian Hyde added a comment -

          Very similar to CALCITE-842; RelFieldTrimmer gets confused and wrongly removes columns that are used by correlating variables.

          Show
          julianhyde Julian Hyde added a comment - Very similar to CALCITE-842 ; RelFieldTrimmer gets confused and wrongly removes columns that are used by correlating variables.
          Hide
          julianhyde Julian Hyde added a comment -

          Maryann Xue, I have put a fix on julianhyde/master: https://github.com/julianhyde/calcite/commit/d2e0b11b8e37f70d3a8d4e9570aa238a910badc7. Can you please review?

          Show
          julianhyde Julian Hyde added a comment - Maryann Xue , I have put a fix on julianhyde/master: https://github.com/julianhyde/calcite/commit/d2e0b11b8e37f70d3a8d4e9570aa238a910badc7 . Can you please review?
          Hide
          julianhyde Julian Hyde added a comment -

          By the way, you were exactly right about the cause. And the solution was already there: the code offsetOut argument to SqlValidatorScope.resolve. It just took some plumbing to connect them.

          Show
          julianhyde Julian Hyde added a comment - By the way, you were exactly right about the cause. And the solution was already there: the code offsetOut argument to SqlValidatorScope.resolve . It just took some plumbing to connect them.
          Hide
          maryannxue Maryann Xue added a comment -

          Thank you, Julian Hyde, for the fix! It looks good to me. Mind if you add one more test case there:

          select empno, ename, sal, e.deptno, loc
          from "scott".emp as e
          join "scott".dept as d using (deptno)
          where e.sal = (
            select max(sal)
            from "scott".emp as e2
            where e2.deptno = e.deptno);
           EMPNO | ENAME | SAL     | DEPTNO | LOC
          -------+-------+---------+--------+----------
            7698 | BLAKE | 2850.00 |     30 | CHICAGO
            7788 | SCOTT | 3000.00 |     20 | DALLAS
            7839 | KING  | 5000.00 |     10 | NEW YORK
            7902 | FORD  | 3000.00 |     20 | DALLAS
          (4 rows)
          
          !ok
          

          The correlate variable is on the left side of the Join, but would also fail after field trimming. I have confirmed that it has also been fixed on your branch.

          Show
          maryannxue Maryann Xue added a comment - Thank you, Julian Hyde , for the fix! It looks good to me. Mind if you add one more test case there: select empno, ename, sal, e.deptno, loc from "scott" .emp as e join "scott" .dept as d using (deptno) where e.sal = ( select max(sal) from "scott" .emp as e2 where e2.deptno = e.deptno); EMPNO | ENAME | SAL | DEPTNO | LOC -------+-------+---------+--------+---------- 7698 | BLAKE | 2850.00 | 30 | CHICAGO 7788 | SCOTT | 3000.00 | 20 | DALLAS 7839 | KING | 5000.00 | 10 | NEW YORK 7902 | FORD | 3000.00 | 20 | DALLAS (4 rows) !ok The correlate variable is on the left side of the Join, but would also fail after field trimming. I have confirmed that it has also been fixed on your branch.
          Hide
          julianhyde Julian Hyde added a comment -

          Thanks for the review, Maryann Xue. I added the test you suggested.

          Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/d80e26c0.

          Show
          julianhyde Julian Hyde added a comment - Thanks for the review, Maryann Xue . I added the test you suggested. Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/d80e26c0 .
          Hide
          julianhyde Julian Hyde added a comment -

          Resolved in release 1.7.0 (2016-03-22).

          Show
          julianhyde Julian Hyde added a comment - Resolved in release 1.7.0 (2016-03-22).

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              maryannxue Maryann Xue
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development