[SPARK-43778] RewriteCorrelatedScalarSubquery should handle duplicate attributes - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.4.0
Fix Version/s: 4.0.0
Component/s: SQL
Labels:
- pull-request-available

Description

This is a correctness problem caused by the fact that the decorrelation rule does not dedup join attributes properly. This leads to the join on (c1 = c1), which is simplified to True and the join becomes a cross product.

Example query:

create view t(c1, c2) as values (0, 1), (0, 2), (1, 2)

select c1, c2, (select count(*) cnt from t t2 where t1.c1 = t2.c1 having cnt = 0) from t t1
-- Correct answer: [(0, 1, null), (0, 2, null), (1, 2, null)]
+---+---+------------------+
|c1 |c2 |scalarsubquery(c1)|
+---+---+------------------+
|0  |1  |null              |
|0  |1  |null              |
|0  |2  |null              |
|0  |2  |null              |
|1  |2  |null              |
|1  |2  |null              |
+---+---+------------------+

Attachments

Issue Links

is blocked by

SPARK-43838 Subquery on single table with having clause can't be optimized

Resolved

is fixed by

SPARK-43838 Subquery on single table with having clause can't be optimized

Resolved

links to

GitHub Pull Request #41439

Activity

People

Assignee:: Unassigned

Reporter:: Andrey Gubichev

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/May/23 18:17

Updated:: 22/Oct/23 00:19

Resolved:: 20/Jul/23 05:08