Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-714

When de-correlating, push join condition into subquery

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.0-incubating
    • Fix Version/s: 1.4.0-incubating
    • Component/s: None
    • Labels:
      None

      Description

      When decorrelating a scalar aggregate subquery, we push copies of the tables from the outer query block into the subquery but don't push the join condition, thereby creating a cartesian join. This seems to be a regression.

      Query:

      select count(*) 
          from dfs.`/Users/asinha/data/tpch-sf1/nation` n, dfs.`/Users/asinha/data/tpch-sf1/orders` o 
               where n.n_nationkey = o.o_orderkey 
               and n.n_nationkey > (select avg(ps.ps_suppkey) from dfs.`/Users/asinha/data/tpch-sf1/partsupp` ps
                                     where n.n_regionkey = ps.ps_partkey);
      

      Here's the plan on an earlier version of Calcite (I am not sure of the version number but it was before the decorrelation refactoring), Note the join between nation and orders at the leaf level has a equi-join condition.

       
      AggregateRel(group=[{}], EXPR$0=[COUNT()])
        ProjectRel($f0=[$0])
          ProjectRel($f0=[0])
            FilterRel(condition=[>($1, $5)])
              ProjectRel(*=[$0], n_nationkey=[$1], n_regionkey=[$2], *0=[$3], o_orderkey=[$4], EXPR$0=[$6])
                JoinRel(condition=[=($2, $5)], joinType=[left])
                  JoinRel(condition=[=($1, $4)], joinType=[inner])
                    EnumerableTableAccessRel(table=[[dfs, /Users/asinha/data/tpch-sf1/nation]])
                    EnumerableTableAccessRel(table=[[dfs, /Users/asinha/data/tpch-sf1/orders]])
                  AggregateRel(group=[{0}], EXPR$0=[AVG($1)])
                    ProjectRel($f0=[$1], ps_suppkey=[$0])
                      ProjectRel(ps_suppkey=[$2], $f0=[$3])
                        FilterRel(condition=[=($3, $1)])
                          JoinRel(condition=[true], joinType=[inner])
                            EnumerableTableAccessRel(table=[[dfs, /Users/asinha/data/tpch-sf1/partsupp]])
                            AggregateRel(group=[{0}])
                              ProjectRel($f0=[$2])
                                JoinRel(condition=[=($1, $4)], joinType=[inner])
                                  EnumerableTableAccessRel(table=[[dfs, /Users/asinha/data/tpch-sf1/nation]])
                                  EnumerableTableAccessRel(table=[[dfs, /Users/asinha/data/tpch-sf1/orders]])
      

      Here's the new plan (I am on version 1.1.0 but I think the plan has not changed in the latest version). Note the join between nation and orders at the leaf level does not have any join condition.

      LogicalAggregate(group=[{}], EXPR$0=[COUNT()])
        LogicalProject($f0=[$0])
          LogicalProject($f0=[0])
            LogicalFilter(condition=[AND(=($1, $4), >($1, $5))])
              LogicalProject(*=[$0], n_nationkey=[$1], n_regionkey=[$2], *0=[$3], o_orderkey=[$4], EXPR$0=[$6])
                LogicalJoin(condition=[=($2, $5)], joinType=[left])
                  LogicalJoin(condition=[true], joinType=[inner])
                    EnumerableTableScan(table=[[dfs, /Users/asinha/data/tpch-sf1/nation]])
                    EnumerableTableScan(table=[[dfs, /Users/asinha/data/tpch-sf1/orders]])
                  LogicalAggregate(group=[{0}], EXPR$0=[AVG($1)])
                    LogicalProject(n_regionkey=[$1], ps_suppkey=[$0])
                      LogicalProject(ps_suppkey=[$2], n_regionkey=[$3])
                        LogicalFilter(condition=[=($3, $1)])
                          LogicalJoin(condition=[true], joinType=[inner])
                            EnumerableTableScan(table=[[dfs, /Users/asinha/data/tpch-sf1/partsupp]])
                            LogicalAggregate(group=[{0}])
                              LogicalProject(n_regionkey=[$2])
                                LogicalJoin(condition=[true], joinType=[inner])
                                  EnumerableTableScan(table=[[dfs, /Users/asinha/data/tpch-sf1/nation]])
                                  EnumerableTableScan(table=[[dfs, /Users/asinha/data/tpch-sf1/orders]])
      

        Issue Links

          Activity

          Hide
          amansinha100 Aman Sinha added a comment -

          I am looking into this. Here's another example query that does not have a cartesian join in the query but the decorrelated version creates a cartesian join:

                SELECT 1 FROM emp e1, dept d1 where e1.deptno = d1.deptno 
                      and e1.sal > (select avg(sal) from emp e2 where e1.empno = e2.empno)
          
          Show
          amansinha100 Aman Sinha added a comment - I am looking into this. Here's another example query that does not have a cartesian join in the query but the decorrelated version creates a cartesian join: SELECT 1 FROM emp e1, dept d1 where e1.deptno = d1.deptno and e1.sal > (select avg(sal) from emp e2 where e1.empno = e2.empno)
          Hide
          amansinha100 Aman Sinha added a comment -

          I have most of this working and have pushed my changes to:
          https://github.com/amansinha100/incubator-calcite/tree/correlation2

          There are plan changes and it would be good for someone to review them.
          There are 2 unit tests that are still failing that I am trying to resolve but would like initial feedback.

          Show
          amansinha100 Aman Sinha added a comment - I have most of this working and have pushed my changes to: https://github.com/amansinha100/incubator-calcite/tree/correlation2 There are plan changes and it would be good for someone to review them. There are 2 unit tests that are still failing that I am trying to resolve but would like initial feedback.
          Hide
          julianhyde Julian Hyde added a comment -

          This looks good. I can see some improvements in some existing queries.

          The failure in SqlToRelConverterTest.testNestedCorrelationsDecorrelated seems to be because a Correlate is missed. And the failure in JdbcTest seems to be something similar. You've probably figured that out already. Let me know if you need further help. If you can get all tests passing I will accept this patch.

          I made a couple of tweaks to allow checkstyle to pass. See https://github.com/julianhyde/incubator-calcite/tree/714-correlation.

          I think you should remove classifyFiltersForCorrelation - you could instead call classifyFilters with joinType = INNER.

          Show
          julianhyde Julian Hyde added a comment - This looks good. I can see some improvements in some existing queries. The failure in SqlToRelConverterTest.testNestedCorrelationsDecorrelated seems to be because a Correlate is missed. And the failure in JdbcTest seems to be something similar. You've probably figured that out already. Let me know if you need further help. If you can get all tests passing I will accept this patch. I made a couple of tweaks to allow checkstyle to pass. See https://github.com/julianhyde/incubator-calcite/tree/714-correlation . I think you should remove classifyFiltersForCorrelation - you could instead call classifyFilters with joinType = INNER.
          Hide
          amansinha100 Aman Sinha added a comment -

          Pull request:
          https://github.com/apache/incubator-calcite/pull/110

          I addressed the failures in the unit tests: the ones related to LogicalCorrelate still being present were addressed by having the FilterProjectTransposeRule check for presence of correlation in the Filter and if yes, not push below Project.
          The other plan changes in RelOptRulesTest were expected based on the previous patch and I had not updated them earlier.
          I also removed the classFiltersForCorrelation as suggested by Julian and used existing method.

          Show
          amansinha100 Aman Sinha added a comment - Pull request: https://github.com/apache/incubator-calcite/pull/110 I addressed the failures in the unit tests: the ones related to LogicalCorrelate still being present were addressed by having the FilterProjectTransposeRule check for presence of correlation in the Filter and if yes, not push below Project. The other plan changes in RelOptRulesTest were expected based on the previous patch and I had not updated them earlier. I also removed the classFiltersForCorrelation as suggested by Julian and used existing method.
          Hide
          julianhyde Julian Hyde added a comment -

          Looks good. I'll add a fix-up commit that will use a static instance of FoundOne as Vladimir Sitnikov suggests. Will merge to master shortly.

          Show
          julianhyde Julian Hyde added a comment - Looks good. I'll add a fix-up commit that will use a static instance of FoundOne as Vladimir Sitnikov suggests. Will merge to master shortly.
          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/29030d8c and http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/e03dafcf .
          Hide
          jnadeau Jacques Nadeau added a comment -

          Resolved in release 1.4.0-incubating (2015-08-23)

          Show
          jnadeau Jacques Nadeau added a comment - Resolved in release 1.4.0-incubating (2015-08-23)

            People

            • Assignee:
              amansinha100 Aman Sinha
              Reporter:
              amansinha100 Aman Sinha
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development