[CALCITE-4494] Improve planning performance with RelSubset check for Rel presence - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.26.0
Fix Version/s: 1.27.0
Component/s: core
Labels:
- performance
- pull-request-available
Environment:

All environments

Description

Problem

Currently, the planning process shows a performance degradation when comparing to version 1.25. Worse palling time seems to affect most queries, but it is especially clear for queries with many Rel nodes (especially with multiple joins).

In a downstream project, we have a stress test that checks the planning time. In some cases, the planning time is increased by x4 (for a query with 28 joins).

The main contributing factor (but not the only one) for the slow-down could be traced to https://github.com/apache/calcite/pull/2222/files.

Potential Solution

As it was mentioned by the reviewers, we may improve the current situation with some tiny changes:

Introduce a method to check that a Rel node belongs to the RelSubset instead of getting all Rel nodes (the current code may take up to 60% of the planning time).

* Improve the null check in RelMdPredicates by building an error message in RelMdPredicates.ExprsItr only when it is required (may additionally take 10% of the planning time due to SortedMap.toString() being expensive when frequently called).

With these 2 changes, I was able to regain most of the lost planning performance.

The following flame graph clearly shows that the call to RelSubset.getRelList() from VolcanoRuleCall.onMatch() is expensive (28 join query):

After the proposed improvements, the flame graph shows the following (28 join query):

It is clear that the HintStrategyTable.isRuleExcluded() call is expensive, but the overall picture is much better.

Also, in my environment, the TPC-H Q7 test takes ~20% less time (39.6 sec vs 32.9 sec) after the proposed improvements. Here, the flame graph also shows that ordinary queries are also affected by the redundant RelSubset.getRelList() calls:

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

CalcitePerf_Planning_after_improvements.png
11/Feb/21 17:21
151 kB
Igor Lozynskyi
CalcitePerf_Planning_RelList_consumes_a_lot.png
11/Feb/21 17:20
190 kB
Igor Lozynskyi
CalcitePerf_Planning_TPCH_Q7_RelList_consumes_a_lot.png
11/Feb/21 17:25
567 kB
Igor Lozynskyi

Issue Links

links to

GitHub Pull Request #2347

Activity

People

Assignee:: Igor Lozynskyi

Reporter:: Igor Lozynskyi

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 11/Feb/21 17:27

Updated:: 03/Jun/21 22:31

Resolved:: 15/Feb/21 09:29

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 40m