[CALCITE-3121] VolcanoPlanner hangs due to subquery with dynamic star - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.19.0
Fix Version/s: 1.21.0
Component/s: None
Labels:
- pull-request-available

Description

After the fix for ~~CALCITE-2798~~ some queries hang during planning in VolcanoPlanner (similar issue was reported in ~~CALCITE-2223~~).

Here is a test case which should be added to the RelOptRulesTest class:

  @Test public void testSubQueryWithOrderByHang() {
    String sql = "select n.n_regionkey from ( select * from "
        + "( select * from sales.customer) t order by t.n_regionkey) n where n.n_nationkey >1 ";

    VolcanoPlanner planner = new VolcanoPlanner(null, null);
    planner.addRelTraitDef(ConventionTraitDef.INSTANCE);

    Tester dynamicTester = createDynamicTester().withDecorrelation(true)
        .withClusterFactory(
            relOptCluster -> RelOptCluster.create(planner, relOptCluster.getRexBuilder()));

    RelRoot root = dynamicTester.convertSqlToRel(sql);

    String planBefore = NL + RelOptUtil.toString(root.rel);
    getDiffRepos().assertEquals("planBefore", "${planBefore}", planBefore);

    PushProjector.ExprCondition exprCondition = expr -> {
      if (expr instanceof RexCall) {
        RexCall call = (RexCall) expr;
        return "item".equals(call.getOperator().getName().toLowerCase(Locale.ROOT));
      }
      return false;
    };
    RuleSet ruleSet =
        RuleSets.ofList(
            FilterProjectTransposeRule.INSTANCE,
            FilterMergeRule.INSTANCE,
            ProjectMergeRule.INSTANCE,
            new ProjectFilterTransposeRule(Project.class, Filter .class,
                RelFactories.LOGICAL_BUILDER, exprCondition),
            EnumerableRules.ENUMERABLE_PROJECT_RULE,
            EnumerableRules.ENUMERABLE_FILTER_RULE,
            EnumerableRules.ENUMERABLE_SORT_RULE,
            EnumerableRules.ENUMERABLE_LIMIT_RULE,
            EnumerableRules.ENUMERABLE_TABLE_SCAN_RULE);
    Program program = Programs.of(ruleSet);

    RelTraitSet toTraits =
        root.rel.getCluster().traitSet()
            .replace(0, EnumerableConvention.INSTANCE);

    RelNode relAfter = program.run(planner, root.rel, toTraits,
        Collections.emptyList(), Collections.emptyList());

    String planAfter = NL + RelOptUtil.toString(relAfter);
    getDiffRepos().assertEquals("planAfter", "${planAfter}", planAfter);
  }

Please note that if LIMIT 9999 is added to the sub-query with order by (so it is not removed due to the fix for ~~CALCITE-2798~~) the test succeeds.

Though the issue with hanging is more general, I think that if it wouldn't be fixed, the fix for ~~CALCITE-2798~~ should be reverted to reduce cases when planner may hang.

Attachments

Issue Links

causes

CALCITE-3188 IndexOutOfBoundsException in ProjectFilterTransposeRule when executing SELECT COUNT(*)

Closed

links to

GitHub Pull Request #1264

GitHub Pull Request #1292

Activity

Ascending order - Click to sort in descending order

Haisheng Yuan added a comment - 07/Jun/19 18:11

The issue can be reproduced by the following query:

String sql = "select n.n_regionkey from ( select * from "
        + "( select * from sales.customer) t) n where n.n_nationkey >1 ";

There is nothing wrong with the fix in ~~CALCITE-2798~~.

Haisheng Yuan added a comment - 07/Jun/19 18:11 The issue can be reproduced by the following query: String sql = "select n.n_regionkey from ( select * from " + "( select * from sales.customer) t) n where n.n_nationkey >1 " ; There is nothing wrong with the fix in CALCITE-2798 .

Haisheng Yuan added a comment - 07/Jun/19 18:27

This is stack trace:

 org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:71)
	at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
	at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
	at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:225)
	at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:132)
	at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
	at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
	at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:225)
	at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:71)
	at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source)
	at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source)
	at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:225)
	at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:132)

Cyclic metadata fetching.

Haisheng Yuan added a comment - 07/Jun/19 18:27 This is stack trace: org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:71) at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:225) at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:132) at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:225) at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:71) at GeneratedMetadataHandler_RowCount.getRowCount_$(Unknown Source) at GeneratedMetadataHandler_RowCount.getRowCount(Unknown Source) at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:225) at org.apache.calcite.rel.metadata.RelMdRowCount.getRowCount(RelMdRowCount.java:132) Cyclic metadata fetching.

Danny Chen added a comment - 08/Jun/19 04:33

It seems that after some planning rules matched, In ProjectMergeRule, the top project reference a wrong RelSubSet as input, so it just add new project over and over again, and never ends up.

Danny Chen added a comment - 08/Jun/19 04:33 It seems that after some planning rules matched, In ProjectMergeRule, the top project reference a wrong RelSubSet as input, so it just add new project over and over again, and never ends up.

Bohdan Kazydub added a comment - 10/Jun/19 06:36

hyuan, as was described in description, the test succeeds without ~~CALCITE-2798~~ but fails with it. The query you wrote is the same bug, which starts to reproduce for more cases with the changes in ~~CALCITE-2798~~.

Bohdan Kazydub added a comment - 10/Jun/19 06:36 hyuan , as was described in description, the test succeeds without CALCITE-2798 but fails with it. The query you wrote is the same bug, which starts to reproduce for more cases with the changes in CALCITE-2798 .

Bohdan Kazydub added a comment - 10/Jun/19 09:27

Reverted ~~CALCITE-2798~~ and added unit test in pull request.
Can someone please review?

Bohdan Kazydub added a comment - 10/Jun/19 09:27 Reverted CALCITE-2798 and added unit test in pull request . Can someone please review?

Haisheng Yuan added a comment - 10/Jun/19 21:56

I am neutral on reverting previous changes. The bug exists there before ~~CALCITE-2798~~, but ~~CALCITE-2798~~ unveiled the issue, for which we don't have a quick fix yet. julianhyde What's your opinion?

Haisheng Yuan added a comment - 10/Jun/19 21:56 I am neutral on reverting previous changes. The bug exists there before CALCITE-2798 , but CALCITE-2798 unveiled the issue, for which we don't have a quick fix yet. julianhyde What's your opinion?

Vova Vysotskyi added a comment - 11/Jun/19 17:36

If there are no objections, I'll merge the PR soon.

Vova Vysotskyi added a comment - 11/Jun/19 17:36 If there are no objections, I'll merge the PR soon.

Julian Hyde added a comment - 11/Jun/19 22:14

No one has made a case for why the behavior was better (or worse) before ~~CALCITE-2798~~. Just arguments along the lines of "well, it works for me".

Julian Hyde added a comment - 11/Jun/19 22:14 No one has made a case for why the behavior was better (or worse) before CALCITE-2798 . Just arguments along the lines of "well, it works for me".

Vova Vysotskyi added a comment - 12/Jun/19 10:07

The aim of ~~CALCITE-2798~~ was to optimize queries by removing unnecessary ORDER BY clauses, but as was mentioned above, it caused the hanging of some queries (due to another, more general issue).

From the functionality perspective, this commit caused a regression, and UT attached to the Jira proves that, so the behavior without the fix for ~~CALCITE-2798~~ was better - volcano planner was able to plan some queries which hang now, I think it is more important than the possibility to produce optimization.

The ideal solution is to fix the original issue which causes hanging with or without ORDER BY clauses in sub-queries (perhaps it is the same as ~~CALCITE-2223~~), but taking into account upcoming release and its timings, revert is an acceptable solution from my point of view.

Vova Vysotskyi added a comment - 12/Jun/19 10:07 The aim of CALCITE-2798 was to optimize queries by removing unnecessary ORDER BY clauses, but as was mentioned above, it caused the hanging of some queries (due to another, more general issue). From the functionality perspective, this commit caused a regression, and UT attached to the Jira proves that, so the behavior without the fix for CALCITE-2798 was better - volcano planner was able to plan some queries which hang now, I think it is more important than the possibility to produce optimization. The ideal solution is to fix the original issue which causes hanging with or without ORDER BY clauses in sub-queries (perhaps it is the same as CALCITE-2223 ), but taking into account upcoming release and its timings, revert is an acceptable solution from my point of view.

Danny Chen added a comment - 12/Jun/19 11:19 - edited

If we change the sql to

String sql = "select n.n_regionkey from ( select * from "
    + "( select n_regionkey, n_nationkey from sales.customer) t) n where n.n_nationkey >1 ";

The query will not hang, so i believe it is the ProjectMergeRule that can not handle the pattern correctly,

project
|-filter
|---project(star(*))
|-----scan

Danny Chen added a comment - 12/Jun/19 11:19 - edited If we change the sql to String sql = "select n.n_regionkey from ( select * from " + "( select n_regionkey, n_nationkey from sales.customer) t) n where n.n_nationkey >1 " ; The query will not hang, so i believe it is the ProjectMergeRule that can not handle the pattern correctly, project |-filter |---project(star(*)) |-----scan

Vova Vysotskyi added a comment - 13/Jun/19 18:38

julianhyde, can I merge this PR to include it into the release?

Vova Vysotskyi added a comment - 13/Jun/19 18:38 julianhyde , can I merge this PR to include it into the release?

Danny Chen added a comment - 02/Jul/19 06:15 - edited

Well, finally i got the reason, now the ProjectFilterTransposeRule will change plan:

<!-- plan-1 -->
LogicalProject(**=[$0])
  LogicalFilter(subset=[rel#40:Subset#6.NONE], condition=[>(ITEM($0, 'N_NATIONKEY'), 1)])
    LogicalTableScan(subset=[rel#11:Subset#0.NONE], table=[[CATALOG, SALES, CUSTOMER]])

to the form like:

<!-- plan-2 -->
LogicalProject(**=[$0])                                      (1)
  LogicalFilter(condition=[>($1, 1)])                        (2)
    LogicalProject(**=[$0], ITEM=[ITEM($0, 'N_NATIONKEY')])  (3)
      LogicalTableScan(subset=[rel#11:Subset#0.NONE], table=[[CATALOG, SALES, CUSTOMER]])

and the plan:

<!-- plan-3 -->
LogicalProject(**=[$0], ITEM=[ITEM($0, 'N_NATIONKEY')])
  LogicalFilter(subset=[rel#15:Subset#2.NONE], condition=[>(ITEM($0, 'N_NATIONKEY'), 1)])
    LogicalProject(subset=[rel#13:Subset#1.NONE], **=[$0])
      LogicalTableScan(subset=[rel#11:Subset#0.NONE], table=[[CATALOG, SALES, CUSTOMER]])

to the form like:

<!-- plan-4 -->
LogicalFilter(condition=[>($1, 1)])                         (4)
  LogicalProject(**=[$0], ITEM=[ITEM($0, 'N_NATIONKEY')])
    LogicalTableScan(subset=[rel#11:Subset#0.NONE], table=[[CATALOG, SALES, CUSTOMER]])

See that (4) has same digest as (2), while these 4 plans are actually in the same RelSet, so this forms a circle.

The plan-2 has more cost that the old one, because it always add a redundant project (3), if we also have FilterProjectTransposeRule in the rule set, it will also trigger infinite match of the ProjectMergeRule for project (1) and new project (3).

Actually this ProjectFilterTransposeRule transformation is meaningless, because it just add a useless project in the plan compared to the old one, so i just skip conversion for such pattern.

Danny Chen added a comment - 02/Jul/19 06:15 - edited Well, finally i got the reason, now the ProjectFilterTransposeRule will change plan:  LogicalProject(**=[$0]) LogicalFilter(subset=[rel#40:Subset#6.NONE], condition=[>(ITEM($0, 'N_NATIONKEY' ), 1)]) LogicalTableScan(subset=[rel#11:Subset#0.NONE], table=[[CATALOG, SALES, CUSTOMER]]) to the form like:  LogicalProject(**=[$0]) (1) LogicalFilter(condition=[>($1, 1)]) (2) LogicalProject(**=[$0], ITEM=[ITEM($0, 'N_NATIONKEY' )]) (3) LogicalTableScan(subset=[rel#11:Subset#0.NONE], table=[[CATALOG, SALES, CUSTOMER]]) and the plan:  LogicalProject(**=[$0], ITEM=[ITEM($0, 'N_NATIONKEY' )]) LogicalFilter(subset=[rel#15:Subset#2.NONE], condition=[>(ITEM($0, 'N_NATIONKEY' ), 1)]) LogicalProject(subset=[rel#13:Subset#1.NONE], **=[$0]) LogicalTableScan(subset=[rel#11:Subset#0.NONE], table=[[CATALOG, SALES, CUSTOMER]]) to the form like:  LogicalFilter(condition=[>($1, 1)]) (4) LogicalProject(**=[$0], ITEM=[ITEM($0, 'N_NATIONKEY' )]) LogicalTableScan(subset=[rel#11:Subset#0.NONE], table=[[CATALOG, SALES, CUSTOMER]]) See that (4) has same digest as (2), while these 4 plans are actually in the same RelSet, so this forms a circle. The plan-2 has more cost that the old one, because it always add a redundant project (3), if we also have FilterProjectTransposeRule in the rule set, it will also trigger infinite match of the ProjectMergeRule for project (1) and new project (3). Actually this ProjectFilterTransposeRule transformation is meaningless, because it just add a useless project in the plan compared to the old one, so i just skip conversion for such pattern.

Danny Chen added a comment - 04/Jul/19 01:43

Fixed in 0ed54bd !

Danny Chen added a comment - 04/Jul/19 01:43 Fixed in 0ed54bd !

Stamatis Zampetakis added a comment - 10/Sep/19 21:37

Resolved in release 1.21.0 (2019-09-12).

Stamatis Zampetakis added a comment - 10/Sep/19 21:37 Resolved in release 1.21.0 (2019-09-12).

People

Assignee:: Danny Chen

Reporter:: Bohdan Kazydub

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 07/Jun/19 17:34

Updated:: 10/Sep/19 21:37

Resolved:: 04/Jul/19 01:43

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

Calcite