[KYLIN-5571] It takes too much time to calculate the data size during pushing down queries, which will lead to the queries un-stoppable. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 5.0-alpha
Fix Version/s: 5.0-beta
Component/s: Query Engine
Labels:
None

Description

During pushing down the query, KE will try to calculate the included data size to set Spark partitions, but if there were too many files on HDFS, it will take a lot of time to complete.

So in order to improve this situation, the following things will be done:

Using a limited thread pool to calculate the data size
Add timeout for the calculation, so as to stop the query as soon as possible
Add new properties:
kylin.query.pushdown.auto-set-shuffle-partitions-multiple=3，the default Spark partition num
kylin.query.pushdown.auto-set-shuffle-partitions-timeout=30, the maximum timeout, 30 seconds by default, to calculate the data size in order to adjust the Spark partition num

After these changes, we can expected the query complete in a fixed duration.

Attachments

Activity

People

Assignee:: Guangyuan Feng

Reporter:: Guangyuan Feng

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 08/Jun/23 08:29

Updated:: 14/Jun/23 02:25

Resolved:: 14/Jun/23 02:25