[SPARK-17348] Incorrect results from subquery transformation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 2.0.3, 2.1.0
Component/s: SQL
Labels:
- correctness

Description

Seq((1,1)).toDF("c1","c2").createOrReplaceTempView("t1")
Seq((1,1),(2,0)).toDF("c1","c2").createOrReplaceTempView("t2")
sql("select c1 from t1 where c1 in (select max(t2.c1) from t2 where t1.c2 >= t2.c2)").show

+---+
| c1|
+---+
|  1|
+---+

The correct result of the above query should be an empty set. Here is an explanation:

Both rows from T2 satisfies the correlated predicate T1.C2 >= T2.C2 when T1.C1 = 1 so both rows needs to be processed in the same group of the aggregation process in the subquery. The result of the aggregation yields MAX(T2.C1) as 2. Therefore, the result of the evaluation of the predicate T1.C1 (which is 1) IN MAX(T2.C1) (which is 2) should be an empty set.

Attachments

Issue Links

relates to

SPARK-18455 General support for correlated subquery processing

Resolved

links to

[Github] Pull Request #15763 (nsyca)

Activity

People

Assignee:: Nattavut Sutyanyong

Reporter:: Nattavut Sutyanyong

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 31/Aug/16 20:30

Updated:: 15/Nov/16 22:22

Resolved:: 14/Nov/16 20:04