[SPARK-43413] IN subquery ListQuery has wrong nullability - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.4.0
Fix Version/s: 3.5.0
Component/s: SQL
Labels:
None

Description

IN subquery expressions are incorrectly marked as non-nullable, even when they are actually nullable. They correctly check the nullability of the left-hand-side, but the right-hand-side of a IN subquery, the ListQuery, is currently defined with nullability = false always. This is incorrect and can lead to incorrect query transformations.

Example: (non_nullable_col IN (select nullable_col)) <=> TRUE . Here the IN expression returns NULL when the nullable_col is null, but our code marks it as non-nullable, and therefore SimplifyBinaryComparison transforms away the <=> TRUE, transforming the expression to non_nullable_col IN (select nullable_col) , which is an incorrect transformation because NULL values of nullable_col now cause the expression to yield NULL instead of FALSE.

This bug can potentially lead to wrong results, but in most cases this doesn't directly cause wrong results end-to-end, because IN subqueries are almost always transformed to semi/anti/existence joins in RewritePredicateSubquery, and this rewrite can also incorrectly discard NULLs, which is another bug. But we can observe it causing wrong behavior in unit tests, and it could easily lead to incorrect query results if there are changes to the surrounding context, so it should be fixed regardless.

This is a long-standing bug that has existed at least since 2016, as long as the ListQuery class has existed.

Attachments

Issue Links

links to

[Github] Pull Request #41094 (jchen5)

Activity

People

Assignee:: Jack Chen

Reporter:: Jack Chen

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 08/May/23 21:12

Updated:: 16/May/23 01:41

Resolved:: 16/May/23 01:41