Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-793

Planner requires unnecessary collation when using materialized view

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.0-incubating
    • Fix Version/s: 1.5.0
    • Component/s: None
    • Labels:
      None

      Description

      When a query does not have an ORDER BY clause, we should ignore the collation trait of the main table plan and should not request the materialized view plan to have the same collation.

      For example, we have a table 'A' sorted by primary key 'id', and we have a materialized view 'V' projected from 'A' which is sorted by column 'col1'. And now we have a query like "select id, col0, col1, col2 from A where col1 < '10'".
      The main table plan will come out like a Filter on top of a TableScan of 'A', while the materialized view plan should also be something like a Filter on top of a TableScan of 'V' and it should not have a Sort, so that if doing a col1 related filter on col1 ordered table 'V' is cheaper, the materialized view plan will be chosen.

        Issue Links

          Activity

          Hide
          maryannxue Maryann Xue added a comment -

          Ran through all the tests, but one failed: LatticeTest.testJG(). The only difference with the ref was that the SUM function hadn't been changed to $SUM0 function:

          Expected:
          JdbcToEnumerableConverter
          JdbcAggregate(group=[

          {7, 16, 25, 27, 31, 37}], m0=[COUNT()], m1=[$SUM0($5)], m2=[$SUM0($7)])
          JdbcJoin(condition=[=($8, $33)], joinType=[inner])
          JdbcJoin(condition=[=($1, $23)], joinType=[inner])
          JdbcJoin(condition=[=($0, $9)], joinType=[inner])
          JdbcTableScan(table=[[foodmart, sales_fact_1997]])
          JdbcTableScan(table=[[foodmart, product]])
          JdbcTableScan(table=[[foodmart, time_by_day]])
          JdbcTableScan(table=[[foodmart, product_class]])

          Actual:
          JdbcToEnumerableConverter
          JdbcAggregate(group=[{7, 16, 25, 27, 31, 37}

          ], m0=[COUNT()], m1=[SUM($5)], m2=[SUM($7)])
          JdbcJoin(condition=[=($8, $33)], joinType=[inner])
          JdbcJoin(condition=[=($1, $23)], joinType=[inner])
          JdbcJoin(condition=[=($0, $9)], joinType=[inner])
          JdbcTableScan(table=[[foodmart, sales_fact_1997]])
          JdbcTableScan(table=[[foodmart, product]])
          JdbcTableScan(table=[[foodmart, time_by_day]])
          JdbcTableScan(table=[[foodmart, product_class]])

          The LogicalAggregate rel was an identical copy of the original LogicalAggregate rel after going through the line of code I added:

                result = result.copy(result.getTraitSet().replace(RelCollations.EMPTY), result.getInputs());
          

          I suspect it might be the unstable costing of the Aggregate rel with different sets of agg functions (SUM and $SUM0 have the same cost), so I made an adjustment to Aggregate.computeSelfCost() just for test purpose:

             @Override public RelOptCost computeSelfCost(RelOptPlanner planner) {
               // REVIEW jvs 24-Aug-2008:  This is bogus, but no more bogus
               // than what's currently in Join.
               double rowCount = RelMetadataQuery.getRowCount(this);
               // Aggregates with more aggregate functions cost a bit more
               float multiplier = 1f + (float) aggCalls.size() * 0.125f;
          +    for (AggregateCall aggCall : aggCalls) {
          +    	if (aggCall.getAggregation().getName().equals("SUM")) {
          +        multiplier += 0.0125f;
          +    	}
          +    }
               return planner.getCostFactory().makeCost(rowCount * multiplier, 0, 0);
            }
          

          That way the test case passed.

          I'll proceed to add a new test case if this first patch looks ok.

          Show
          maryannxue Maryann Xue added a comment - Ran through all the tests, but one failed: LatticeTest.testJG(). The only difference with the ref was that the SUM function hadn't been changed to $SUM0 function: Expected: JdbcToEnumerableConverter JdbcAggregate(group=[ {7, 16, 25, 27, 31, 37}], m0= [COUNT()] , m1= [$SUM0($5)] , m2= [$SUM0($7)] ) JdbcJoin(condition= [=($8, $33)] , joinType= [inner] ) JdbcJoin(condition= [=($1, $23)] , joinType= [inner] ) JdbcJoin(condition= [=($0, $9)] , joinType= [inner] ) JdbcTableScan(table=[ [foodmart, sales_fact_1997] ]) JdbcTableScan(table=[ [foodmart, product] ]) JdbcTableScan(table=[ [foodmart, time_by_day] ]) JdbcTableScan(table=[ [foodmart, product_class] ]) Actual: JdbcToEnumerableConverter JdbcAggregate(group=[{7, 16, 25, 27, 31, 37} ], m0= [COUNT()] , m1= [SUM($5)] , m2= [SUM($7)] ) JdbcJoin(condition= [=($8, $33)] , joinType= [inner] ) JdbcJoin(condition= [=($1, $23)] , joinType= [inner] ) JdbcJoin(condition= [=($0, $9)] , joinType= [inner] ) JdbcTableScan(table=[ [foodmart, sales_fact_1997] ]) JdbcTableScan(table=[ [foodmart, product] ]) JdbcTableScan(table=[ [foodmart, time_by_day] ]) JdbcTableScan(table=[ [foodmart, product_class] ]) The LogicalAggregate rel was an identical copy of the original LogicalAggregate rel after going through the line of code I added: result = result.copy(result.getTraitSet().replace(RelCollations.EMPTY), result.getInputs()); I suspect it might be the unstable costing of the Aggregate rel with different sets of agg functions (SUM and $SUM0 have the same cost), so I made an adjustment to Aggregate.computeSelfCost() just for test purpose: @Override public RelOptCost computeSelfCost(RelOptPlanner planner) { // REVIEW jvs 24-Aug-2008: This is bogus, but no more bogus // than what's currently in Join. double rowCount = RelMetadataQuery.getRowCount( this ); // Aggregates with more aggregate functions cost a bit more float multiplier = 1f + ( float ) aggCalls.size() * 0.125f; + for (AggregateCall aggCall : aggCalls) { + if (aggCall.getAggregation().getName().equals( "SUM" )) { + multiplier += 0.0125f; + } + } return planner.getCostFactory().makeCost(rowCount * multiplier, 0, 0); } That way the test case passed. I'll proceed to add a new test case if this first patch looks ok.
          Hide
          julianhyde Julian Hyde added a comment -

          I've seen changes from SUM to and from $SUM0 before. I agree, it seems to be due to an unstable cost model. If your change fixes that instability that would be great (it might cause a few test failures) – go ahead and submit.

          Show
          julianhyde Julian Hyde added a comment - I've seen changes from SUM to and from $SUM0 before. I agree, it seems to be due to an unstable cost model. If your change fixes that instability that would be great (it might cause a few test failures) – go ahead and submit.
          Hide
          julianhyde Julian Hyde added a comment - - edited

          Maryann Xue, do you have a test case for this? I could try to add something to MaterializationTest, I suppose.

          Show
          julianhyde Julian Hyde added a comment - - edited Maryann Xue , do you have a test case for this? I could try to add something to MaterializationTest, I suppose.
          Hide
          maryannxue Maryann Xue added a comment -

          I was thinking if there's any existing table in HR or FOODMART schema is ordered and then I can create an unordered materialization?

          Show
          maryannxue Maryann Xue added a comment - I was thinking if there's any existing table in HR or FOODMART schema is ordered and then I can create an unordered materialization?
          Hide
          julianhyde Julian Hyde added a comment -

          Yes - I am developing something along those lines.

          Show
          julianhyde Julian Hyde added a comment - Yes - I am developing something along those lines.
          Hide
          maryannxue Maryann Xue added a comment -

          Thanks a lot, Julian Hyde!

          Show
          maryannxue Maryann Xue added a comment - Thanks a lot, Julian Hyde !
          Hide
          julianhyde Julian Hyde added a comment -

          Looks good. I have checked into my new-master branch and will merge to master after 1.4 is released.

          Show
          julianhyde Julian Hyde added a comment - Looks good. I have checked into my new-master branch and will merge to master after 1.4 is released.
          Hide
          julianhyde Julian Hyde added a comment -
          Show
          julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/incubator-calcite/commit/5b02090a . Thanks for the patch, Maryann Xue !
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Resolved in release 1.5.0 (2015-11-10)

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Resolved in release 1.5.0 (2015-11-10)

            People

            • Assignee:
              julianhyde Julian Hyde
              Reporter:
              maryannxue Maryann Xue
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 120h
                120h
                Remaining:
                Remaining Estimate - 120h
                120h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development