[SPARK-19730] Predicate Subqueries do not push results of subqueries to data source - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.1.0
Fix Version/s: None
Component/s: Optimizer, SQL
Labels:
- bulk-closed

Description

When a SparkSQL query contains a subquery in the where clause, such as a predicate query using the IN operator, the results of that subquery are not pushed down as a filter to the DataSourceAPI for the outer query.

Example:
Select point, time, value from data where time between now()-86400 and now() and point in (select point from groups where group_id=5);

Two queries will be sent to the data Source. One for the subquery, and another for the outer query. The subquery works correctly returning the points in the group, however, outer query does not push a filter for point column.

Affect:
The "group" table has a few hundred rows to group a few hundred thousand points. The data table has several billion rows keyed by point and time. Without the ability to push down the filters for the columns of outer the query, the data source cannot properly conduct its pruned scan.

The subquery results should be pushed down to the outer query as an IN Filter with the results of the subquery.

Physical Plan

*Project point#263, value#270
+- SortMergeJoin point#263, col#284, LeftSemi
:- *Sort point#263 ASC NULLS FIRST, false, 0
: +- Exchange hashpartitioning(point#263, 20)
: +- *Filter ((time#264L >= 1487964691000) && (time#264L <= 1487964696000))
: +- *Scan @4b455128 DBNAME.data[ point#263, time#264Lvalue#270] PushedFilters: [GreaterThanOrEqual(time,1487964691000), LessThanOrEqual(time,1487964691000), ReadSchema: struct<point:int,time:bigint,value:double>...
+- *Sort col#284 ASC NULLS FIRST, false, 0
+- Exchange hashpartitioning(col#284, 20)
+- Generate explode(points#273), false, false, col#284
+- *Project points#273
+- *Filter (group_id#272 = 1)
+- *Scan @12fb3c1a <db>.groupspoints#273,group_id#272 PushedFilters: [EqualTo(group_id,1)], ReadSchema: struct<points:array<int>> |

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Shawn Lavelle

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 24/Feb/17 18:04

Updated:: 21/May/19 04:11

Resolved:: 21/May/19 04:11