[KYLIN-5401] Optimize code logic for pushdown queries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: v4.0.1, v4.0.2, v4.0.3
Fix Version/s: None
Component/s: Query Engine, Spark Engine
Labels:
None

Description

It is found that the push-down query in kylin4.0.x is very slow for the simple query eg: select * from table limit 10. It should respond in seconds, but it often takes a few minutes, and the larger the query data set, the more time-consuming it takes Long, which is very abnormal.

BI tools often execute some simple queries to display detailed data. Abnormal query duration often causes BI tools to time out and return error messages, which is very unfriendly to user experience.

Through investigation, it is found that there is a shuffle process in the query plan of this very simple detailed query, which is outrageous.

The main logic of Kylin executing push-down query is concentrated in org.apache.kylin.query.pushdown.SparkSqlClient.

Unnecessary Spark DataFrame type transform in org.apache.kylin.query.pushdown.SparkSqlClient#DFToList is the main cause of this problem.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: yuan

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 12/Jan/23 12:51

Updated:: 13/Jan/23 07:57