Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-5209

Proper sub-query handling if it is used inside select list and group by

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.30.0
    • 1.33.0
    • None

    Description

      SqlRelConverter.getSubQuery uses reference comparing in order to find SubQuery instance.
      But it does not work in case when select list has column which refers to the column inside `GROUP BY` clause.
      For example:

      SELECT deptno IN (1,2) FROM emp.deptno GROUP BY deptno IN (1,2);

      In this case `SqlNode` inside select list and `SqlNode` inside `GROUP BY` are different SqlNode instances. The reference comparing won't work.

      The following bug is a result of this behaviour:

      For these examples assume that:

      • `SqlToRelConverter.Config.config().withInSubQueryThreshold(5)` has been set.
      • The source table has 4 columns (Column_0 through Column_3)
      • Each column is of type BIGINT NOT NULL

      The failing query is:

      select
          case
              when Column_3 in (1, 2, 3, 4, 5) THEN 1
              else 0
          end
      from T1000
      group by
          case
              when Column_3 in (1, 2, 3, 4, 5) THEN 1
              else 0
          end
      

      The exception is:

      3
      java.lang.ArrayIndexOutOfBoundsException: 3
      	at com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:75)
      	at org.apache.calcite.tools.RelBuilder.inferAlias(RelBuilder.java:2163)
      	at org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:1956)
      	at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1797)
      	at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1769)
      	at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1758)
      	at org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.register(SqlToRelConverter.java:4680)
      	at org.apache.calcite.sql2rel.SqlToRelConverter.substituteSubQuery(SqlToRelConverter.java:1268)
      	at org.apache.calcite.sql2rel.SqlToRelConverter.replaceSubQueries(SqlToRelConverter.java:1127)
      	at org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:3325)
      	at org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:3192)
      	at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:738)
      	at org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:664)
      	at org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3589)
      	at org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:589)
      	at org.apache.calcite.prepare.PlannerImpl.rel(PlannerImpl.java:259)
      

      The table could contain N >= 4 columns, and the use of any column at or after index 3 will also cause this exception.
      The use of any column before index 3 gives the RelNode tree:

      66:LogicalProject(EXPR$0=[$0])
        65:LogicalJoin(condition=[=($3, $4)], joinType=[left])
          61:LogicalProject(EXPR$0=[$0], $f0=[$1], $f1=[$2], $f10=[$2])
            60:LogicalJoin(condition=[true], joinType=[inner])
              56:LogicalAggregate(group=[{0}])
                55:LogicalProject(EXPR$0=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 0)), AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
                  54:LogicalJoin(condition=[=($6, $7)], joinType=[left])
                    50:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], Column_3=[$3], $f0=[$4], $f1=[$5], Column_20=[$2])
                      49:LogicalJoin(condition=[true], joinType=[inner])
                        45:TableScan(...)
                        48:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
                          47:LogicalProject(ROW_VALUE=[$0], $f1=[true])
                            46:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
                    53:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
                      52:LogicalProject(ROW_VALUE=[$0], $f1=[true])
                        51:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
              59:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
                58:LogicalProject(ROW_VALUE=[$0], $f1=[true])
                  57:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
          64:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
            63:LogicalProject(ROW_VALUE=[$0], $f1=[true])
              62:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
      

      If the number of predicates is less than the subquery threshold, there are no issues.

      Additionally, the below queries also produce RelNode trees
      Using an alias:

      select
          case
              when Column_3 in (1, 2, 3, 4, 5) THEN 1
              else 0
          end as CASE_ALIAS
      from T1000
      group by
          CASE_ALIAS
      
      90:LogicalAggregate(group=[{0}])
        89:LogicalProject(CASE_ALIAS=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 0)), AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
          88:LogicalJoin(condition=[=($6, $7)], joinType=[left])
            84:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], Column_3=[$3], $f0=[$4], $f1=[$5], Column_30=[$3])
              83:LogicalJoin(condition=[true], joinType=[inner])
                79:TableScan(...)
                82:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
                  81:LogicalProject(ROW_VALUE=[$0], $f1=[true])
                    80:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
            87:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
              86:LogicalProject(ROW_VALUE=[$0], $f1=[true])
                85:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
      

      Not using the group-by value in the projection:

      select
          count(*)
      from T1000
      group by
          case
              when Column_3 in (1, 2, 3, 4, 5) THEN 1
              else 0
          end
      
      116:LogicalProject(EXPR$0=[$1])
        115:LogicalAggregate(group=[{0}], EXPR$0=[COUNT()])
          114:LogicalProject($f0=[CASE(CAST(OR(AND(IS NOT NULL($8), <>($4, 0)), AND(<($5, $4), null, <>($4, 0), IS NULL($8)))):BOOLEAN NOT NULL, 1, 0)])
            113:LogicalJoin(condition=[=($6, $7)], joinType=[left])
              109:LogicalProject(Column_0=[$0], Column_1=[$1], Column_2=[$2], Column_3=[$3], $f0=[$4], $f1=[$5], Column_30=[$3])
                108:LogicalJoin(condition=[true], joinType=[inner])
                  104:TableScan(...)
                  107:LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
                    106:LogicalProject(ROW_VALUE=[$0], $f1=[true])
                      105:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
              112:LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
                111:LogicalProject(ROW_VALUE=[$0], $f1=[true])
                  110:LogicalValues(tuples=[[{ 1 }, { 2 }, { 3 }, { 4 }, { 5 }]])
      

      Attachments

        Issue Links

          Activity

            People

              dmsysolyatin Dmitry Sysolyatin
              ian.bertolacci Ian Bertolacci
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h