Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-2028

SubQueryRemoveRule should create Join, not Correlate, for un-correlated sub-queries

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.15.0
    • Component/s: None
    • Labels:

      Description

      I'm doing some work about running TPC-H queries on Flink, and encounter a problem that q20 can't be de-correlated appropriately when ConfigBuilder.withExpand(false).
      TPC-H q20 query has a WHERE clause composed of an IN predicate with an un-correlated subquery and an EXISTS predicate with a correlated subquery.
      Therefore, in SubQueryRemoveRule.FILTER, the "variablesSet" from the entire Filter rel, referenced by the correlated subquery of EXISTS predicate actually, applies to the un-correlated subquery of IN predicate, and finally generates an inner-join LogicalCorrelate which will be ignored in RelDecorrelator.

      1. The q20 query:
        
        -- tpch20
        select
          s.s_name,
          s.s_address
        from
          supplier s,
          nation n
        where
          s.s_suppkey in (
            select
              ps.ps_suppkey
            from
              partsupp ps
            where
              ps. ps_partkey in (
                select
                  p.p_partkey
                from
                  part p
                where
                  p.p_name like 'antique%'
              )
              and ps.ps_availqty > (
                select
                  0.5 * sum(l.l_quantity)
                from
                  lineitem l
                where
                  l.l_partkey = ps.ps_partkey
                  and l.l_suppkey = ps.ps_suppkey
                  and l.l_shipdate >= date '1993-01-01'
                  and l.l_shipdate < date '1993-01-01' + interval '1' year
              )
          )
          and s.s_nationkey = n.n_nationkey
          and n.n_name = 'KENYA'
        order by
          s.s_name
        
      • Plan before SubQueryRemoveRule.FILTER:
        LogicalSort(sort0=[$0], dir0=[ASC])
          LogicalProject(s_name=[$1], s_address=[$2])
            LogicalFilter(condition=[AND(IN($0, {
        		LogicalProject(ps_suppkey=[$1])
        		  LogicalFilter(condition=[AND(IN($0, {
        				LogicalProject(p_partkey=[$0])
        				  LogicalFilter(condition=[LIKE($1, 'antique%')])
        				    LogicalTableScan(table=[[part, source: [selectedFields=[p_partkey, p_name, p_mfgr, p_brand, p_type, p_size, p_container, p_retailprice, p_comment]]]])
        				}), >($2, $SCALAR_QUERY({
        				LogicalProject(EXPR$0=[*(0.5, $0)])
        				  LogicalAggregate(group=[{}], agg#0=[SUM($0)])
        				    LogicalProject(l_quantity=[$4])
        				      LogicalFilter(condition=[AND(=($1, $cor0.ps_partkey), =($2, $cor0.ps_suppkey), >=($10, 1993-01-01), <($10, DATETIME_PLUS(1993-01-01, 12)))])
        				        LogicalTableScan(table=[[lineitem, source: [selectedFields=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment]]]])
        				})))], variablesSet=[[$cor0]])
        		    LogicalTableScan(table=[[partsupp, source: [selectedFields=[ps_partkey, ps_suppkey, ps_availqty, ps_supplycost, ps_comment]]]])
        		}), =($3, $7), =($8, 'KENYA'))])
              LogicalJoin(condition=[true], joinType=[inner])
                LogicalTableScan(table=[[supplier, source: [selectedFields=[s_suppkey, s_name, s_address, s_nationkey, s_phone, s_acctbal, s_comment]]]])
                LogicalTableScan(table=[[nation, source: [selectedFields=[n_nationkey, n_name, n_regionkey, n_comment]]]])
        
      • Plan after SubQueryRemoveRule.FILTER (also differs from the plan when ConfigBuilder.withExpand(true)):
        LogicalSort(sort0=[$0], dir0=[ASC])
          LogicalProject(s_name=[$1], s_address=[$2])
            LogicalProject(s_suppkey=[$0], s_name=[$1], s_address=[$2], s_nationkey=[$3], s_phone=[$4], s_acctbal=[$5], s_comment=[$6], n_nationkey=[$7], n_name=[$8], n_regionkey=[$9], n_comment=[$10])
              LogicalFilter(condition=[AND(=($3, $7), =($8, 'KENYA'))])
                LogicalJoin(condition=[=($0, $11)], joinType=[inner])
                  LogicalJoin(condition=[true], joinType=[inner])
                    LogicalTableScan(table=[[supplier, source: [selectedFields=[s_suppkey, s_name, s_address, s_nationkey, s_phone, s_acctbal, s_comment]]]])
                    LogicalTableScan(table=[[nation, source: [selectedFields=[n_nationkey, n_name, n_regionkey, n_comment]]]])
                  LogicalAggregate(group=[{0}])
                    LogicalProject(ps_suppkey=[$1])
                      LogicalProject(ps_partkey=[$0], ps_suppkey=[$1], ps_availqty=[$2], ps_supplycost=[$3], ps_comment=[$4])
                        LogicalFilter(condition=[>($2, $6)])
                          LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{0, 1}])
                            LogicalFilter(condition=[=($0, $5)])
                              LogicalCorrelate(correlation=[$cor0], joinType=[inner], requiredColumns=[{}])   <<<--- the problem is here!
                                LogicalTableScan(table=[[partsupp, source: [selectedFields=[ps_partkey, ps_suppkey, ps_availqty, ps_supplycost, ps_comment]]]])
                                LogicalAggregate(group=[{0}])
                                  LogicalProject(p_partkey=[$0])
                                    LogicalFilter(condition=[LIKE($1, 'antique%')])
                                      LogicalTableScan(table=[[part, source: [selectedFields=[p_partkey, p_name, p_mfgr, p_brand, p_type, p_size, p_container, p_retailprice, p_comment]]]])
                            LogicalAggregate(group=[{}], agg#0=[SINGLE_VALUE($0)])
                              LogicalProject(EXPR$0=[*(0.5, $0)])
                                LogicalAggregate(group=[{}], agg#0=[SUM($0)])
                                  LogicalProject(l_quantity=[$4])
                                    LogicalFilter(condition=[AND(=($1, $cor0.ps_partkey), =($2, $cor0.ps_suppkey), >=($10, 1993-01-01), <($10, DATETIME_PLUS(1993-01-01, 12)))])
                                      LogicalTableScan(table=[[lineitem, source: [selectedFields=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment]]]])
        

        Activity

        Hide
        julianhyde Julian Hyde added a comment -

        Resolved in release 1.15.0 (2017-12-11).

        Show
        julianhyde Julian Hyde added a comment - Resolved in release 1.15.0 (2017-12-11).
        Hide
        julianhyde Julian Hyde added a comment -
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/d7c4627c ; thanks for the PR, Liao Xintao !
        Hide
        Liao Liao Xintao added a comment -
        Show
        Liao Liao Xintao added a comment - Julian Hyde I have created a pull request: https://github.com/apache/calcite/pull/556
        Hide
        Liao Liao Xintao added a comment -

        ok, I'm on it.

        Show
        Liao Liao Xintao added a comment - ok, I'm on it.
        Hide
        julianhyde Julian Hyde added a comment -

        Can you please convert the patch to a pull request, add test case(s), and make sure existing tests all pass.

        Show
        julianhyde Julian Hyde added a comment - Can you please convert the patch to a pull request, add test case(s), and make sure existing tests all pass.
        Hide
        Liao Liao Xintao added a comment -

        I try to fix it by re-getting the "variablesSet" from the subquery, instead of that getting from the Filter directly, and it works.
        But I'm not sure that something more should be done.

        Show
        Liao Liao Xintao added a comment - I try to fix it by re-getting the "variablesSet" from the subquery, instead of that getting from the Filter directly, and it works. But I'm not sure that something more should be done.

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            Liao Liao Xintao
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development