Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-6513

FilterProjectTransposeRule may cause OOM when Project expressions are complex

    XMLWordPrintableJSON

Details

    Description

      CALCITE-3774 addresses preventing merging projects when the resulting expressions in the merged project are too complex and lead to slow compilation or out of memory.

      However, when there is a Filter on top of the Projects with a predicate referencing the complex expressions FilterProjectTransposeRule tries to push down the Filter below the bottom Project merging the expressions and causing OOM.

      The issue was initially reproduced using Hive with the Hive version of FilterProjectTransposeRule. See: HIVE-28264

      Calcite is also affected: https://github.com/kasakrisz/calcite/commit/b35a02f368624a9c4768f348cd072a95ed6de3e1

      Let's see the following query

      SELECT x1 from
          (SELECT 'L1' || x0  || x0 || x0 || x0 as x1 from
              (SELECT 'L0' || ENAME || ENAME || ENAME || ENAME as x0 from emp) t1) t2
      WHERE x1 = 'Something'
      

      Let's set the bloat property of RelBuilder.Config to 3.
      The initial plan of the query is:

      LogicalProject(X1=[$0])
        LogicalFilter(condition=[=($0, 'Something')])
          LogicalProject(X1=[||(||(||(||('L1', $0), $0), $0), $0)])
            LogicalProject(X0=[||(||(||(||('L0', $1), $1), $1), $1)])
              LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      

      The expressions in the Project operators are mergeable, but the resulting expression's complexity exceeds the limit of 3 in our example.
      However, while applying FilterProjectTransposeRule the expressions in the Project operators are merged because the expression in the upper Project references the expression in the lower Project and the predicate in the Filter operator also references it. The limit is not applied this case, so we end up with a plan

      LogicalProject(X1=[$0])
        LogicalProject(X1=[||(||(||(||('L1', $0), $0), $0), $0)])
          LogicalProject(X0=[||(||(||(||('L0', $1), $1), $1), $1)])
            LogicalFilter(condition=[=(||(||(||(||('L1', ||(||(||(||('L0', $1), $1), $1), $1)), ||(||(||(||('L0', $1), $1), $1), $1)), ||(||(||(||('L0', $1), $1), $1), $1)), ||(||(||(||('L0', $1), $1), $1), $1)), 'Something')])
              LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      

      Attachments

        Issue Links

          Activity

            People

              kkasa Krisztian Kasa
              kkasa Krisztian Kasa
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: