Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
TLDR; CoreRules.AGGREGATE_REMOVE is fired by a HepPlanner but while removing the Aggregate, instead of returning the Aggregate's input, it returns a VolcanoPlanner's RelSubset with the input, which leads to unforeseeable consequences.
Details: This seems a strange issue that happens because several factors occur.
I first reproduced it on my application with the following query (on TPCH):
SELECT c.c_custkey FROM customer c WHERE c.c_name IN ('271', '272', '273', '274', '275', '276', '342', '343', '344', '345','346', '347', '348', '349', '350', '351', '352', '353', '354', '355', '356', '357', '358', '359', '360') AND EXISTS (SELECT 1 FROM orders o WHERE o.o_custkey = c.c_custkey)
But the issue can be reproduced also in Calcite by adding this test into HepPlannerTest:
@Test void testAggregateRemove() { final RelBuilder builder = RelBuilderTest.createBuilder(c -> c.withAggregateUnique(true)); final RelNode root = builder .values(new String[]{"i"}, 1, 2, 3) // important to have values sorted .distinct() .build(); final HepProgram program = new HepProgramBuilder() .addRuleInstance(CoreRules.AGGREGATE_REMOVE) .build(); final HepPlanner planner = new HepPlanner(program); planner.setRoot(root); final RelNode result = planner.findBestExp(); assertThat(result, is(instanceOf(LogicalValues.class))); // fails because result is a RelSubset }
The important elements are: firstly our RelOptCluster has a VolcanoPlanner as planner (so any relNode.getCluster().getPlanner() call that we execute will return a VolcanoPlanner instance). Nevertheless we also apply some rules via a HepPlanner. I think this is a quite common strategy in Calcite clients to obtain a better performance: first apply a subset of rules that are always beneficial via a HepPlanner, and then apply the "main" set of rules via the cost-based VolcanoPlanner.
Secondly, we have AggregateRemoveRule, which we use in the HepPlanner phase.
This rule contains the following code:
@Override public void onMatch(RelOptRuleCall call) { final Aggregate aggregate = call.rel(0); final RelNode input = aggregate.getInput(); ... final RelNode newInput = convert(input, aggregate.getTraitSet().simplify()); // <-- *** [1] relBuilder.push(newInput); ... call.getPlanner().prune(aggregate); // <-- *** [2] call.transformTo(relBuilder.build()); }
Notice the line [2] which uses call.getPlanner() to call the prune method. By using call.getPlanner() we get the correct planner of the rule that is being fired, in this case a HepPlanner, so we end up calling HepPlanner#prune, which is fine.
However, the line [1] calls the RelOptRule#convert static method:
public static RelNode convert(RelNode rel, RelTraitSet toTraits) { RelOptPlanner planner = rel.getCluster().getPlanner(); // <-- *** [3] RelTraitSet outTraits = rel.getTraitSet(); for (int i = 0; i < toTraits.size(); i++) { RelTrait toTrait = toTraits.getTrait(i); if (toTrait != null) outTraits = outTraits.replace(i, toTrait); } if (rel.getTraitSet().matches(outTraits)) return rel; return planner.changeTraits(rel, outTraits); // <-- *** [4] }
Notice how in this case, the planner is obtained from the relNode's cluster [3], in our case that would be the VolcanoPlanner, which is potentially problematic. Further down, if the relNode matches the outTraits, no action is done and the same relNode is returned, no problem here. But, if it does not match them, then RelOptPlanner#changeTraits will be called, i.e. VolcanoPlanner#changeTraits [4], and this is where the problem will originate: in our scenario VolcanoPlanner#changeTraits will return a Volcano's RelSubset, which is completely unhandable by the HepPlanner that triggered the rule, and that leads to the incorrect plan returned by the HepPlanner.
In this case, what happens with our original query (LogicalValues with sorted values), we get to RelOptRule#convert with the RelNode being a LogicalValues with Convention.NONE + Collation[0], and the toTraits are the ones from the LogicalAggregate that we are removing: Convention.NONE + Collation[] . Since the traits from the LogicalValues do not match the LogicalAgggregate traits (Collation[0] != Collation[]), the RelOptPlanner#changeTraits is called and the problem occurs. I am not sure why here RelTraitSet#matches is used (which computes an exact match, hence returning false), rather than RelTraitSet#satisfies, which would have returned true, because a sorted LogicalValues (Collation[0]) satisfies the unsorted Collation[], but I assume there is a reason for that.
As a workaround, if the LogicalValues elements are NOT in order, then the problem is avoided: we deal with a LogicalValues with Collation[] , so inside RelOptRule#convert the LogicalValues traits (Convention.NONE + Collation[]) match the LogicalAggregates ones (Convention.NONE + Collation[]), so the method returns without calling RelOptPlanner#changeTraits, so the problem does not happen. This can be confirmed by modifying the proposed test:
@Test void testAggregateRemoveOk() { final RelBuilder builder = RelBuilderTest.createBuilder(c -> c.withAggregateUnique(true)); final RelNode root = builder .values(new String[]{"i"}, 1, 42, 3) // not sorted .distinct() .build(); final HepProgram program = new HepProgramBuilder() .addRuleInstance(CoreRules.AGGREGATE_REMOVE) .build(); final HepPlanner planner = new HepPlanner(program); planner.setRoot(root); final RelNode result = planner.findBestExp(); assertThat(result, is(instanceOf(LogicalValues.class))); // ok }
Attachments
Issue Links
- is related to
-
CALCITE-5370 AGGREGATE_REMOVE will convert certain SQL statement into RelSubSet
- Closed
- relates to
-
CALCITE-1536 Initialize cluster before planner
- Open
-
CALCITE-1681 Provide a way to copy RelNode trees between clusters
- Open
- links to