Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-2683

ProjectMergeRule should not be performed when Nondeterministic udf has been referenced more than once

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core
    • Labels:
      None

      Description

      Currently, there are some merge rules for project, such as CalcMergeRule, ProjectMergeRule, and ProjectCalcMergeRule. I found that these merge rules should not be performed when Nondeterministic expression of the bottom project has been referenced more than once by the top project. Take the following test as an example:

        @Test public void testProjectMergeCalcMergeWithNonDeterministic() throws Exception {
          HepProgram program = new HepProgramBuilder()
                  .addRuleInstance(FilterProjectTransposeRule.INSTANCE)
                  .addRuleInstance(ProjectMergeRule.INSTANCE)
                  .build();
      
          checkPlanning(program,
                  "select name, a as a1, a as a2 from (\n"
                          + "  select *, rand() as a\n"
                          + "  from dept)\n"
                          + "where deptno = 10\n");
        }
      

      The first select generates `a` from `rand()` and the second select generate `a1` and `a2` from `a`. From the SQL, `a1` should equal to `a2`.
      Let's take a look at the result plan:

      LogicalProject(NAME=[$1], A1=[RAND()], A2=[RAND()])
        LogicalFilter(condition=[=($0, 10)])
          LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
      

      In the plan, a1 may not equal to a2 due to the projects merge which is against the SQL(a1 equals to a2).

      One option to solve the problem is to disable these merge rules in such cases, so that the result plan will be:

      LogicalProject(NAME=[$1], A1=[$2], A2=[$2])
        LogicalProject(DEPTNO=[$0], NAME=[$1], A=[RAND()])
          LogicalFilter(condition=[=($0, 10)])
            LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
      

      Any suggestions are greatly appreciated.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                hequn8128 Hequn Cheng
                Reporter:
                hequn8128 Hequn Cheng
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: