Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
Description
RelDecorrelator decorrelates a query in two main steps:
- First, a few Correlates cases are removed via rules (in removeCorrelationViaRule method).
- Then, the main decorrelation logic is applied (decorrelateRel methods called by reflection).
Currently, the rules applied on the first step are hardcoded, and cannot be configured.
We are facing a situation where a Correlate is converted via one of these hardcoded rules (RemoveCorrelationForScalarAggregateRule), when in fact the main decorrelation logic (if the rule were not applied) would offer an arguably more beneficial plan:
-- Original (sub)plan LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{0, 1}]) LogicalJoin(condition=[=($0, $5)], joinType=[inner]) LogicalTableScan(table=[[session, partsupp]]) LogicalProject(p_partkey=[$0]) LogicalFilter(condition=[LIKE($1, 'forest%':VARCHAR)]) LogicalTableScan(table=[[session, part]]) LogicalProject(EXPR$0=[*(0.50, $0)]) LogicalAggregate(group=[{}], agg#0=[SUM($0)]) LogicalProject(l_quantity=[$4]) LogicalFilter(condition=[AND(=($1, $cor0.ps_partkey), =($2, $cor0.ps_suppkey), SEARCH($10, Sarg[[1994-01-01..1995-01-01)]))]) LogicalTableScan(table=[[session, lineitem]]) -- Decorrelation via RemoveCorrelationForScalarAggregateRule LogicalProject(ps_partkey=[$0], ps_suppkey=[$1], ps_availqty=[$2], ps_supplycost=[$3], ps_comment=[$4], p_partkey=[$5], $f6=[*(0.50, $6)]) LogicalAggregate(group=[{0, 1, 2, 3, 4, 5}], agg#0=[SUM($6)]) LogicalProject(ps_partkey=[$0], ps_suppkey=[$1], ps_availqty=[$2], ps_supplycost=[$3], ps_comment=[$4], p_partkey=[$5], l_quantity=[$10]) LogicalJoin(condition=[AND(=($7, $0), =($8, $1), SEARCH($16, Sarg[[1994-01-01..1995-01-01)]))], joinType=[left]) LogicalJoin(condition=[=($0, $5)], joinType=[inner]) LogicalTableScan(table=[[session, partsupp]]) LogicalProject(p_partkey=[$0]) LogicalFilter(condition=[LIKE($1, 'forest%':VARCHAR)]) LogicalTableScan(table=[[session, part]]) LogicalProject(l_orderkey=[$0], l_partkey=[$1], l_suppkey=[$2], l_linenumber=[$3], l_quantity=[$4], l_extendedprice=[$5], l_discount=[$6], l_tax=[$7], l_returnflag=[$8], l_linestatus=[$9], l_shipdate=[$10], l_commitdate=[$11], l_receiptdate=[$12], l_shipinstruct=[$13], l_shipmode=[$14], l_comment=[$15], nullIndicator=[true]) LogicalTableScan(table=[[session, lineitem]]) -- -- Decorrelation via main logic (without RemoveCorrelationForScalarAggregateRule) LogicalJoin(condition=[AND(=($0, $6), =($1, $7))], joinType=[left]) LogicalJoin(condition=[=($0, $5)], joinType=[inner]) LogicalTableScan(table=[[session, partsupp]]) LogicalProject(p_partkey=[$0]) LogicalFilter(condition=[LIKE($1, 'forest%':VARCHAR)]) LogicalTableScan(table=[[session, part]]) LogicalAggregate(group=[{0, 1}], agg#0=[SUM($2)]) LogicalProject(l_partkey=[$1], l_suppkey=[$2], l_quantity=[$4]) LogicalFilter(condition=[AND(SEARCH($10, Sarg[[1994-01-01..1995-01-01)]), IS NOT NULL($1), IS NOT NULL($2))]) LogicalTableScan(table=[[session, lineitem]])
The idea of this ticket is to make configurable the rules used by RelDecorrelator#removeCorrelationViaRule. By default, everything will behave as before (the same "default" rules will be applied), so we keep backwards compatibility; but we shall offer new methods to allow RelDecorrelator's caller to tune the rules to be used in this step.