Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-938

More accurate rowCount for Aggregate applied to already unique keys

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5.0
    • Component/s: None
    • Labels:
      None

      Description

      If columns in "select distinct" are already distinct, there can be two sets of equivalent rel before and after AggregateRemoveRule.

      agg
       |                  input
      input
      10.0                100.0
      

      Based on the default implementation of rel metadata, the rowCount of the "before" rel is only 1/10 of that of the "after" rel, but meanwhile the "after" rel is definitely cheaper. So the Volcano planner would most likely either fail to pick the cheapest one or have an inconsistent state due to CALCITE-830.

      An example (based EnumerableRel cost model):
      The plan for

      select empno, d.deptno
      from "scott".emp
      join (select distinct deptno from "scott".dept) d
      using (deptno);
      

      would be

      EnumerableCalc(expr#0..2=[{inputs}], EMPNO=[$t1], DEPTNO=[$t0])
        EnumerableJoin(condition=[=($0, $2)], joinType=[inner])
          EnumerableAggregate(group=[$0])
            EnumerableTableScan(table=[[scott, DEPT]])
          EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], DEPTNO=[$t7])
            EnumerableTableScan(table=[[scott, EMP]])
      

      , while it should be

      EnumerableCalc(expr#0..2=[{inputs}], EMPNO=[$t1], DEPTNO=[$t0])
        EnumerableJoin(condition=[=($0, $2)], joinType=[inner])
          EnumerableCalc(expr#0..2=[{inputs}], DEPTNO=[$t0])
            EnumerableTableScan(table=[[scott, DEPT]])
          EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], DEPTNO=[$t7])
            EnumerableTableScan(table=[[scott, EMP]])
      

        Activity

        Hide
        maryannxue Maryann Xue added a comment -

        Different from matching rules like AggregateRemoveRule, for metadata calculation we need a better way to return columnUniqueness for RelSubset.
        There was an implementation in RelMdColumnUniqueness for RelSubset but was deliberately removed from real use. I figured (after running the tests) the reason was that the old implementation would cause infinite loop since there could be cyclic links in RelSubset after applying ProjectRemoveRule.
        One way would be improve the old implementation by detecting and breaking the cyclic links when making recursive calls. But for the purpose of calculating cost only, we might not need to return any meaning value for RelSubset still in unimplementable state. So the current fix is just return the value for "best" rel if it's available otherwise just return unknown.

        Show
        maryannxue Maryann Xue added a comment - Different from matching rules like AggregateRemoveRule, for metadata calculation we need a better way to return columnUniqueness for RelSubset. There was an implementation in RelMdColumnUniqueness for RelSubset but was deliberately removed from real use. I figured (after running the tests) the reason was that the old implementation would cause infinite loop since there could be cyclic links in RelSubset after applying ProjectRemoveRule. One way would be improve the old implementation by detecting and breaking the cyclic links when making recursive calls. But for the purpose of calculating cost only, we might not need to return any meaning value for RelSubset still in unimplementable state. So the current fix is just return the value for "best" rel if it's available otherwise just return unknown.
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/52b06213 . Thanks for the patch, Maryann Xue !
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        Resolved in release 1.5.0 (2015-11-10)

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - Resolved in release 1.5.0 (2015-11-10)

          People

          • Assignee:
            maryannxue Maryann Xue
            Reporter:
            maryannxue Maryann Xue
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development