[HIVE-16757] Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0
Component/s: Query Planning
Labels:
None

Target Version/s:

3.0.0

Description

Calling Calcite's RelMetadataQuery.instance() is very expensive because it places a new memoization cache on the stack. Hidden in the deperecated AbstractRelNode.getRows() call is a call to instance(). In hive we have a number of places where we're calling the deprecated getRows() instead of the new API estimateRowCount(RelMetadataQuery mq) which accepts the RelMetadataQuery, which most places we actually have it handy to pass. On looking at the a complex query (49 joins) there are 2995340 calls to AbstractRelNode.getRows, each one busting the current memoization cache away.

Was: On complex queries HiveRelMdRowCount.getRowCount can get called many times. since it does not memoize its result and the call is recursive, it results in an explosion of calls. for example a query with 49 joins, during join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets called 6442 as a top level call, but the recursivity exploded this to 501729 calls. Memoization of the rezult would stop the recursion early. In my testing this reduced the join reordering time for said query from 11s to <1s..

Note there is no need for HiveRelMdRowCount memoization because the function is called in stacks similar to this:

	at org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
	at GeneratedMetadataHandler_RowCount.getRowCount_$
	at GeneratedMetadataHandler_RowCount.getRowCount
	at org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
	at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
	at org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)

and GeneratedMetadataHandler_RowCount.getRowCount handles memoization.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-16757.01.patch
25/May/17 13:00
4 kB
Remus Rusanu
HIVE-16757.02.patch
25/May/17 17:58
37 kB
Remus Rusanu
HIVE-16757.03.patch
28/May/17 20:54
5 kB
Remus Rusanu
HIVE-16757.04.patch
29/May/17 07:58
9 kB
Remus Rusanu
HIVE-16757.05.patch
30/May/17 19:30
9 kB
Remus Rusanu
HIVE-16757.06.patch
30/May/17 22:15
9 kB
Remus Rusanu

Activity

People

Assignee:: Remus Rusanu

Reporter:: Remus Rusanu

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 25/May/17 12:27

Updated:: 27/Feb/24 22:23

Resolved:: 31/May/17 05:39