Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-3121

VolcanoPlanner hangs due to subquery with dynamic star

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.19.0
    • 1.21.0
    • None

    Description

      After the fix for CALCITE-2798 some queries hang during planning in VolcanoPlanner (similar issue was reported in CALCITE-2223).

      Here is a test case which should be added to the RelOptRulesTest class:

        @Test public void testSubQueryWithOrderByHang() {
          String sql = "select n.n_regionkey from ( select * from "
              + "( select * from sales.customer) t order by t.n_regionkey) n where n.n_nationkey >1 ";
      
          VolcanoPlanner planner = new VolcanoPlanner(null, null);
          planner.addRelTraitDef(ConventionTraitDef.INSTANCE);
      
          Tester dynamicTester = createDynamicTester().withDecorrelation(true)
              .withClusterFactory(
                  relOptCluster -> RelOptCluster.create(planner, relOptCluster.getRexBuilder()));
      
          RelRoot root = dynamicTester.convertSqlToRel(sql);
      
          String planBefore = NL + RelOptUtil.toString(root.rel);
          getDiffRepos().assertEquals("planBefore", "${planBefore}", planBefore);
      
          PushProjector.ExprCondition exprCondition = expr -> {
            if (expr instanceof RexCall) {
              RexCall call = (RexCall) expr;
              return "item".equals(call.getOperator().getName().toLowerCase(Locale.ROOT));
            }
            return false;
          };
          RuleSet ruleSet =
              RuleSets.ofList(
                  FilterProjectTransposeRule.INSTANCE,
                  FilterMergeRule.INSTANCE,
                  ProjectMergeRule.INSTANCE,
                  new ProjectFilterTransposeRule(Project.class, Filter .class,
                      RelFactories.LOGICAL_BUILDER, exprCondition),
                  EnumerableRules.ENUMERABLE_PROJECT_RULE,
                  EnumerableRules.ENUMERABLE_FILTER_RULE,
                  EnumerableRules.ENUMERABLE_SORT_RULE,
                  EnumerableRules.ENUMERABLE_LIMIT_RULE,
                  EnumerableRules.ENUMERABLE_TABLE_SCAN_RULE);
          Program program = Programs.of(ruleSet);
      
          RelTraitSet toTraits =
              root.rel.getCluster().traitSet()
                  .replace(0, EnumerableConvention.INSTANCE);
      
          RelNode relAfter = program.run(planner, root.rel, toTraits,
              Collections.emptyList(), Collections.emptyList());
      
          String planAfter = NL + RelOptUtil.toString(relAfter);
          getDiffRepos().assertEquals("planAfter", "${planAfter}", planAfter);
        }
      

      Please note that if LIMIT 9999 is added to the sub-query with order by (so it is not removed due to the fix for CALCITE-2798) the test succeeds.

      Though the issue with hanging is more general, I think that if it wouldn't be fixed, the fix for CALCITE-2798 should be reverted to reduce cases when planner may hang.

      Attachments

        Issue Links

          Activity

            hyuan Haisheng Yuan added a comment -

            The issue can be reproduced by the following query:

            String sql = "select n.n_regionkey from ( select * from "
                    + "( select * from sales.customer) t) n where n.n_nationkey >1 ";
            

            There is nothing wrong with the fix in CALCITE-2798.

            hyuan Haisheng Yuan added a comment - The issue can be reproduced by the following query: String sql = "select n.n_regionkey from ( select * from " + "( select * from sales.customer) t) n where n.n_nationkey >1 " ; There is nothing wrong with the fix in CALCITE-2798 .
            hyuan Haisheng Yuan added a comment -

            This is stack trace:

             org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:71)
            	at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
            	at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
            	at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:225)
            	at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:132)
            	at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
            	at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
            	at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:225)
            	at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:71)
            	at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
            	at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
            	at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:225)
            	at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:132)
            

            Cyclic metadata fetching.

            hyuan Haisheng Yuan added a comment - This is stack trace: org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:71) at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:225) at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:132) at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:225) at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:71) at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:225) at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:132) Cyclic metadata fetching.
            danny0405 Danny Chen added a comment -

            It seems that after some planning rules matched, In ProjectMergeRule, the top project reference a wrong RelSubSet as input, so it just add new project over and over again, and never ends up.

            danny0405 Danny Chen added a comment - It seems that after some planning rules matched, In ProjectMergeRule, the top project reference a wrong RelSubSet as input, so it just add new project over and over again, and never ends up.
            bohdan Bohdan Kazydub added a comment -

            hyuan, as was described in description, the test succeeds without CALCITE-2798 but fails with it. The query you wrote is the same bug, which starts to reproduce for more cases with the changes in CALCITE-2798.

            bohdan Bohdan Kazydub added a comment - hyuan , as was described in description, the test succeeds without CALCITE-2798 but fails with it. The query you wrote is the same bug, which starts to reproduce for more cases with the changes in CALCITE-2798 .
            bohdan Bohdan Kazydub added a comment -

            Reverted CALCITE-2798 and added unit test in pull request.
            Can someone please review?

            bohdan Bohdan Kazydub added a comment - Reverted CALCITE-2798 and added unit test in pull request . Can someone please review?
            hyuan Haisheng Yuan added a comment -

            I am neutral on reverting previous changes. The bug exists there before CALCITE-2798, but CALCITE-2798 unveiled the issue, for which we don't have a quick fix yet. julianhyde What's your opinion?

            hyuan Haisheng Yuan added a comment - I am neutral on reverting previous changes. The bug exists there before CALCITE-2798 , but CALCITE-2798 unveiled the issue, for which we don't have a quick fix yet. julianhyde What's your opinion?
            volodymyr Vova Vysotskyi added a comment -

            If there are no objections, I'll merge the PR soon.

            volodymyr Vova Vysotskyi added a comment - If there are no objections, I'll merge the PR soon.
            julianhyde Julian Hyde added a comment -

            No one has made a case for why the behavior was better (or worse) before CALCITE-2798. Just arguments along the lines of "well, it works for me".

            julianhyde Julian Hyde added a comment - No one has made a case for why the behavior was better (or worse) before CALCITE-2798 . Just arguments along the lines of "well, it works for me".
            volodymyr Vova Vysotskyi added a comment -

            The aim of CALCITE-2798 was to optimize queries by removing unnecessary ORDER BY clauses, but as was mentioned above, it caused the hanging of some queries (due to another, more general issue).

            From the functionality perspective, this commit caused a regression, and UT attached to the Jira proves that, so the behavior without the fix for CALCITE-2798 was better - volcano planner was able to plan some queries which hang now, I think it is more important than the possibility to produce optimization.

            The ideal solution is to fix the original issue which causes hanging with or without ORDER BY clauses in sub-queries (perhaps it is the same as CALCITE-2223), but taking into account upcoming release and its timings, revert is an acceptable solution from my point of view.

            volodymyr Vova Vysotskyi added a comment - The aim of CALCITE-2798 was to optimize queries by removing unnecessary ORDER BY clauses, but as was mentioned above, it caused the hanging of some queries (due to another, more general issue). From the functionality perspective, this commit caused a regression, and UT attached to the Jira proves that, so the behavior without the fix for CALCITE-2798 was better - volcano planner was able to plan some queries which hang now, I think it is more important than the possibility to produce optimization. The ideal solution is to fix the original issue which causes hanging with or without ORDER BY clauses in sub-queries (perhaps it is the same as CALCITE-2223 ), but taking into account upcoming release and its timings, revert is an acceptable solution from my point of view.
            danny0405 Danny Chen added a comment - - edited

            If we change the sql to

            String sql = "select n.n_regionkey from ( select * from "
                + "( select n_regionkey, n_nationkey from sales.customer) t) n where n.n_nationkey >1 ";
            

            The query will not hang, so i believe it is the ProjectMergeRule that can not handle the pattern correctly,

            project
            |-filter
            |---project(star(*))
            |-----scan
            danny0405 Danny Chen added a comment - - edited If we change the sql to String sql = "select n.n_regionkey from ( select * from " + "( select n_regionkey, n_nationkey from sales.customer) t) n where n.n_nationkey >1 " ; The query will not hang, so i believe it is the ProjectMergeRule that can not handle the pattern correctly, project |-filter |---project(star(*)) |-----scan
            volodymyr Vova Vysotskyi added a comment -

            julianhyde, can I merge this PR to include it into the release?

            volodymyr Vova Vysotskyi added a comment - julianhyde , can I merge this PR to include it into the release?
            danny0405 Danny Chen added a comment - - edited

            Well, finally i got the reason, now the ProjectFilterTransposeRule will change plan:

            <!-- plan-1 -->
            LogicalProject(**=[$0])
              LogicalFilter(subset=[rel#40:Subset#6.NONE], condition=[>(ITEM($0, 'N_NATIONKEY'), 1)])
                LogicalTableScan(subset=[rel#11:Subset#0.NONE], table=[[CATALOG, SALES, CUSTOMER]])
            

            to the form like:

            <!-- plan-2 -->
            LogicalProject(**=[$0])                                      (1)
              LogicalFilter(condition=[>($1, 1)])                        (2)
                LogicalProject(**=[$0], ITEM=[ITEM($0, 'N_NATIONKEY')])  (3)
                  LogicalTableScan(subset=[rel#11:Subset#0.NONE], table=[[CATALOG, SALES, CUSTOMER]])
            

            and the plan:

            <!-- plan-3 -->
            LogicalProject(**=[$0], ITEM=[ITEM($0, 'N_NATIONKEY')])
              LogicalFilter(subset=[rel#15:Subset#2.NONE], condition=[>(ITEM($0, 'N_NATIONKEY'), 1)])
                LogicalProject(subset=[rel#13:Subset#1.NONE], **=[$0])
                  LogicalTableScan(subset=[rel#11:Subset#0.NONE], table=[[CATALOG, SALES, CUSTOMER]])
            

            to the form like:

            <!-- plan-4 -->
            LogicalFilter(condition=[>($1, 1)])                         (4)
              LogicalProject(**=[$0], ITEM=[ITEM($0, 'N_NATIONKEY')])
                LogicalTableScan(subset=[rel#11:Subset#0.NONE], table=[[CATALOG, SALES, CUSTOMER]])

            See that (4) has same digest as (2), while these 4 plans are actually in the same RelSet, so this forms a circle.

            The plan-2 has more cost that the old one, because it always add a redundant project (3), if we also have FilterProjectTransposeRule in the rule set, it will also trigger infinite match of the ProjectMergeRule for project (1) and new project (3).

            Actually this ProjectFilterTransposeRule transformation is meaningless, because it just add a useless project in the plan compared to the old one, so i just skip conversion for such pattern.

            danny0405 Danny Chen added a comment - - edited Well, finally i got the reason, now the ProjectFilterTransposeRule will change plan: <!-- plan-1 --> LogicalProject(**=[$0]) LogicalFilter(subset=[rel#40:Subset#6.NONE], condition=[>(ITEM($0, 'N_NATIONKEY' ), 1)]) LogicalTableScan(subset=[rel#11:Subset#0.NONE], table=[[CATALOG, SALES, CUSTOMER]]) to the form like: <!-- plan-2 --> LogicalProject(**=[$0]) (1) LogicalFilter(condition=[>($1, 1)]) (2) LogicalProject(**=[$0], ITEM=[ITEM($0, 'N_NATIONKEY' )]) (3) LogicalTableScan(subset=[rel#11:Subset#0.NONE], table=[[CATALOG, SALES, CUSTOMER]]) and the plan: <!-- plan-3 --> LogicalProject(**=[$0], ITEM=[ITEM($0, 'N_NATIONKEY' )]) LogicalFilter(subset=[rel#15:Subset#2.NONE], condition=[>(ITEM($0, 'N_NATIONKEY' ), 1)]) LogicalProject(subset=[rel#13:Subset#1.NONE], **=[$0]) LogicalTableScan(subset=[rel#11:Subset#0.NONE], table=[[CATALOG, SALES, CUSTOMER]]) to the form like: <!-- plan-4 --> LogicalFilter(condition=[>($1, 1)]) (4) LogicalProject(**=[$0], ITEM=[ITEM($0, 'N_NATIONKEY' )]) LogicalTableScan(subset=[rel#11:Subset#0.NONE], table=[[CATALOG, SALES, CUSTOMER]]) See that (4) has same digest as (2), while these 4 plans are actually in the same RelSet, so this forms a circle. The plan-2 has more cost that the old one, because it always add a redundant project (3), if we also have FilterProjectTransposeRule in the rule set, it will also trigger infinite match of the ProjectMergeRule for project (1) and new project (3). Actually this ProjectFilterTransposeRule transformation is meaningless, because it just add a useless project in the plan compared to the old one, so i just skip conversion for such pattern.
            danny0405 Danny Chen added a comment -

            Fixed in 0ed54bd !

            danny0405 Danny Chen added a comment - Fixed in 0ed54bd !

            Resolved in release 1.21.0 (2019-09-12).

            zabetak Stamatis Zampetakis added a comment - Resolved in release 1.21.0 (2019-09-12).

            People

              danny0405 Danny Chen
              bohdan Bohdan Kazydub
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h