Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-4414

RelMdSelectivity#getSelectivity for Calc can propagate a predicate with wrong references

Details

    Description

      RelMdSelectivity#getSelectivity(Calc rel, RelMetadataQuery mq, RexNode predicate) method:

        public Double getSelectivity(Calc rel, RelMetadataQuery mq, RexNode predicate) {
          final RexProgram rexProgram = rel.getProgram();
          final RexLocalRef programCondition = rexProgram.getCondition();
          if (programCondition == null) {
            return getSelectivity(rel.getInput(), mq, predicate); // [2]
          } else {
            // [1]
            return mq.getSelectivity(rel.getInput(),
                RelMdUtil.minusPreds(
                    rel.getCluster().getRexBuilder(),
                    predicate,
                    rexProgram.expandLocalRef(programCondition)));
          }
        }
      

      currently passes down the predicate to its input [1] without considering any possible translation, since the predicate might include expressions generated by the Calc's projections; hence when the Calc's input analyzes the predicate, it can end up trying to access fields that do not exist on its rowType.

      This can lead to unforeseeable consequences, like the test attached to the first comment, where after RelMdSelectivity#getSelectivity(Calc) we reach RelMdSelectivity#getSelectivity(Union) and this method ends up in an ArrayIndexOutOfBoundsException because it tries to access a field ($1) that does not exists on its rowType (which only has $0). This $1 is actually projected by the Calc which is on top of the Union.

      Note I: in a similar situation, RelM uses{{ RelOptUtil.pushPastProject}} (which "Converts an expression that is based on the output fields of a Project to an equivalent expression on the Project's input fields.") to convert the predicate before passing it to the Project's input.

      Note II: in the code snipped above that in our test example the issue only happens in line [1], and not in [2] because the "if" block calls getSelectivity instead of mq.getSelectivity, although I find this a bit questionable and maybe mq.getSelectivity should be called here as well.

      Attachments

        Issue Links

          Activity

            rubenql Ruben Q L added a comment - - edited

            Test:

              @Test void test() {
                final RelBuilder builder = RelBuilder.create(RelBuilderTest.config().build());
                final RelNode relNode = builder
                    .scan("EMP")
                    .project(
                        builder.field("DEPTNO"))
                    .scan("EMP")
                    .project(
                        builder.field("DEPTNO"))
                    .union(true)
                    .projectPlus(builder.field("DEPTNO"))
                    .filter(
                        builder.equals(
                            builder.field(0),
                            builder.literal(0)))
                    .build();
            
                // Program to convert Project + Filter into a single Calc
                final HepProgram program = new HepProgramBuilder()
                    .addRuleInstance(CoreRules.FILTER_TO_CALC)
                    .addRuleInstance(CoreRules.PROJECT_TO_CALC)
                    .addRuleInstance(CoreRules.CALC_MERGE)
                    .build();
                final HepPlanner hepPlanner = new HepPlanner(program);
                hepPlanner.setRoot(relNode);
                RelNode output = hepPlanner.findBestExp();
            
                // Add filter on the extra field generated by projectPlus (now a Calc after hepPlanner)
                output = builder
                    .push(output)
                    .filter(
                        builder.equals(
                            builder.field(1),
                            builder.literal(0)))
                    .build();
            
                // Should not fail
                output.estimateRowCount(output.getCluster().getMetadataQuery());
              }
            

            Fails with:

            java.lang.ArrayIndexOutOfBoundsException: 1
            	at org.apache.calcite.plan.RelOptUtil$RexInputConverter.visitInputRef(RelOptUtil.java:4452)
            	at org.apache.calcite.plan.RelOptUtil$RexInputConverter.visitInputRef(RelOptUtil.java:4372)
            	at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112)
            	at org.apache.calcite.rex.RexShuttle.visitList(RexShuttle.java:158)
            	at org.apache.calcite.rex.RexShuttle.visitCall(RexShuttle.java:110)
            	at org.apache.calcite.rex.RexShuttle.visitCall(RexShuttle.java:33)
            	at org.apache.calcite.rex.RexCall.accept(RexCall.java:175)
            	at org.apache.calcite.rel.metadata.RelMdSelectivity.getSelectivity(RelMdSelectivity.java:80)
            	at GeneratedMetadataHandler_Selectivity.getSelectivity_$(Unknown Source)
            	at GeneratedMetadataHandler_Selectivity.getSelectivity(Unknown Source)
            	at org.apache.calcite.rel.metadata.RelMetadataQuery.getSelectivity(RelMetadataQuery.java:426)
            	at org.apache.calcite.rel.metadata.RelMdSelectivity.getSelectivity(RelMdSelectivity.java:131)
            	at GeneratedMetadataHandler_Selectivity.getSelectivity_$(Unknown Source)
            	at GeneratedMetadataHandler_Selectivity.getSelectivity(Unknown Source)
            	at org.apache.calcite.rel.metadata.RelMetadataQuery.getSelectivity(RelMetadataQuery.java:426)
            	at org.apache.calcite.rel.metadata.RelMdUtil.estimateFilteredRows(RelMdUtil.java:767)
            	at org.apache.calcite.rel.core.Filter.estimateRowCount(Filter.java:139)
            	at org.apache.calcite.test.RelMdSelectivityTest.test(RelMdSelectivityTest.java:72)
            

            The plan that we are dealing with at the end of the test:

            LogicalFilter(condition=[=($1, 0)])
              LogicalCalc(expr#0=[{inputs}], expr#1=[0], expr#2=[=($t0, $t1)], DEPTNO=[$t0], DEPTNO0=[$t0], $condition=[$t2])
                LogicalUnion(all=[true])
                  LogicalCalc(expr#0..7=[{inputs}], DEPTNO=[$t7])
                    LogicalTableScan(table=[[scott, EMP]])
                  LogicalCalc(expr#0..7=[{inputs}], DEPTNO=[$t7])
                    LogicalTableScan(table=[[scott, EMP]])
            
            rubenql Ruben Q L added a comment - - edited Test: @Test void test() { final RelBuilder builder = RelBuilder.create(RelBuilderTest.config().build()); final RelNode relNode = builder .scan( "EMP" ) .project( builder.field( "DEPTNO" )) .scan( "EMP" ) .project( builder.field( "DEPTNO" )) .union( true ) .projectPlus(builder.field( "DEPTNO" )) .filter( builder.equals( builder.field(0), builder.literal(0))) .build(); // Program to convert Project + Filter into a single Calc final HepProgram program = new HepProgramBuilder() .addRuleInstance(CoreRules.FILTER_TO_CALC) .addRuleInstance(CoreRules.PROJECT_TO_CALC) .addRuleInstance(CoreRules.CALC_MERGE) .build(); final HepPlanner hepPlanner = new HepPlanner(program); hepPlanner.setRoot(relNode); RelNode output = hepPlanner.findBestExp(); // Add filter on the extra field generated by projectPlus (now a Calc after hepPlanner) output = builder .push(output) .filter( builder.equals( builder.field(1), builder.literal(0))) .build(); // Should not fail output.estimateRowCount(output.getCluster().getMetadataQuery()); } Fails with: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.calcite.plan.RelOptUtil$RexInputConverter.visitInputRef(RelOptUtil.java:4452) at org.apache.calcite.plan.RelOptUtil$RexInputConverter.visitInputRef(RelOptUtil.java:4372) at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112) at org.apache.calcite.rex.RexShuttle.visitList(RexShuttle.java:158) at org.apache.calcite.rex.RexShuttle.visitCall(RexShuttle.java:110) at org.apache.calcite.rex.RexShuttle.visitCall(RexShuttle.java:33) at org.apache.calcite.rex.RexCall.accept(RexCall.java:175) at org.apache.calcite.rel.metadata.RelMdSelectivity.getSelectivity(RelMdSelectivity.java:80) at GeneratedMetadataHandler_Selectivity.getSelectivity_$(Unknown Source) at GeneratedMetadataHandler_Selectivity.getSelectivity(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getSelectivity(RelMetadataQuery.java:426) at org.apache.calcite.rel.metadata.RelMdSelectivity.getSelectivity(RelMdSelectivity.java:131) at GeneratedMetadataHandler_Selectivity.getSelectivity_$(Unknown Source) at GeneratedMetadataHandler_Selectivity.getSelectivity(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getSelectivity(RelMetadataQuery.java:426) at org.apache.calcite.rel.metadata.RelMdUtil.estimateFilteredRows(RelMdUtil.java:767) at org.apache.calcite.rel.core.Filter.estimateRowCount(Filter.java:139) at org.apache.calcite.test.RelMdSelectivityTest.test(RelMdSelectivityTest.java:72) The plan that we are dealing with at the end of the test: LogicalFilter(condition=[=($1, 0)]) LogicalCalc(expr#0=[{inputs}], expr#1=[0], expr#2=[=($t0, $t1)], DEPTNO=[$t0], DEPTNO0=[$t0], $condition=[$t2]) LogicalUnion(all=[true]) LogicalCalc(expr#0..7=[{inputs}], DEPTNO=[$t7]) LogicalTableScan(table=[[scott, EMP]]) LogicalCalc(expr#0..7=[{inputs}], DEPTNO=[$t7]) LogicalTableScan(table=[[scott, EMP]])
            rubenql Ruben Q L added a comment -

            julianhyde, if I am not mistaken, you implemented RelMdSelectivity#getSelectivity for Project; maybe you could take a look at this patch involving Calc, please?

            rubenql Ruben Q L added a comment - julianhyde , if I am not mistaken, you implemented RelMdSelectivity#getSelectivity for Project; maybe you could take a look at this patch involving Calc, please?
            rubenql Ruben Q L added a comment - Fixed via https://github.com/apache/calcite/commit/b4e399cb35224d8c8d55f02b7cf2b9649a3b28a4

            Resolved in release 1.27.0 (2021-06-03)

            zabetak Stamatis Zampetakis added a comment - Resolved in release 1.27.0 (2021-06-03)

            People

              rubenql Ruben Q L
              rubenql Ruben Q L
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m