[SPARK-23027] optimizer a simple query using a non-existent data is too slow - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Invalid
Affects Version/s: 2.0.1
Fix Version/s: None
Component/s: Optimizer, SQL
Labels:
None

Description

When i use spark sql to do ad-hoc query, i have data partitioned by event_day, event_minute, event_hour, data is large enough，one day data is about 3T， we saved 3 month data.
But when query use a non-existent day， get optimizedPlan is too slow.
i use “sparkSession.sessionState.executePlan(logicalPlan).optimizedPlan” get optimized plan， for five minutes，i can not get it. Query is simple enough, like:
SELECT
event_day
FROM db.table t1
WHERE (t1.event_day='20170104' and t1.event_hour='23' and t1.event_minute='55')
LIMIT 1

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: wangminfeng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 10/Jan/18 11:56

Updated:: 17/May/20 17:58

Resolved:: 11/Jan/18 15:57