[SPARK-27956] Allow subqueries as partition filter - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

Subqueries are not pushed down as partition filters. See following example

create table user_mayerjoh.tab (c1 string)
partitioned by (c2 string)
stored as parquet;

explain select * from user_mayerjoh.tab where c2 < 1;

== Physical Plan ==

(1) FileScan parquet user_mayerjoh.tabc1#22,c2#23 Batched: true, Format: Parquet, Location: PrunedInMemoryFileIndex[], PartitionCount: 0, *PartitionFilters: isnotnull(c2#23), (cast(c2#23 as int) < 1), PushedFilters: [], ReadSchema: struct<c1:string>

explain select * from user_mayerjoh.tab where c2 < (select 1);

== Physical Plan ==

+- (1) FileScan parquet user_mayerjoh.tabc1#30,c2#31 Batched: true, Format: Parquet, Location: PrunedInMemoryFileIndex[], PartitionCount: 0, *PartitionFilters: isnotnull(c2#31), PushedFilters: [], ReadSchema: struct<c1:string>

Is it possible to first execute the subquery and use the result as partition filter?

Attachments

Issue Links

Add Link

duplicates

SPARK-26893 Allow partition pruning with subquery filters on file source

Resolved

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Unassigned

Reporter:: Johannes Mayer

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 05/Jun/19 10:39

Updated:: 07/Jun/19 03:59

Resolved:: 07/Jun/19 03:59

Agile

View on Board

Allow subqueries as partition filter

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment