Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Duplicate
-
Impala 2.2, Impala 2.3.0
-
None
Description
A query may return incorrect results due to wrong predicate assignment in the following scenario:
1. There is an inline view that contains an outer join
2. That inline view is joined with another table in the enclosing query block
3. That join has an On-clause containing a predicate that only references columns originating from the outer-joined tables inside the inline view
Example/Repro:
create table t (c1 int, c2 int); insert into t values(1, 1); The following query returns wrong results. We expect an empty result set. select * from t t1 join (select a.c1, b.c2 from t a left outer join t b on a.c1 = b.c2) v on v.c2 is null +----+----+----+------+ | c1 | c2 | c1 | c2 | +----+----+----+------+ | 1 | 1 | 1 | NULL | +----+----+----+------+ Plan: +------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=4.06GB VCores=2 | | WARNING: The following tables are missing relevant table and/or column statistics. | | default.t | | | | 07:EXCHANGE [UNPARTITIONED] | | | | | 04:NESTED LOOP JOIN [CROSS JOIN, BROADCAST] | | | | | |--06:EXCHANGE [BROADCAST] | | | | | | | 03:HASH JOIN [LEFT OUTER JOIN, BROADCAST] | | | | hash predicates: a.c1 = b.c2 | | | | | | | |--05:EXCHANGE [BROADCAST] | | | | | | | | | 02:SCAN HDFS [default.t b] | | | | partitions=1/1 files=1 size=4B | | | | predicates: b.c2 IS NULL <--- This is the bug. Predicate should be assigned to the LEFT OUTER JOIN. | | | | | | | 01:SCAN HDFS [default.t a] | | | partitions=1/1 files=1 size=4B | | | | | 00:SCAN HDFS [default.t t1] | | partitions=1/1 files=1 size=4B | +------------------------------------------------------------------------------------+
Workaround
As a workaround the offending predicate from the On-clause can be moved into the WHERE clause. Note that this workaround is only correct will if the original On-clause belongs to an INNER or SEMI join.
Query with workaround applied returns correct results.
select * from t t1 join (select a.c1, b.c2 from t a left outer join t b on a.c1 = b.c2) v where v.c2 is null Correctly returns an empty result set. Plan: +------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=4.06GB VCores=2 | | WARNING: The following tables are missing relevant table and/or column statistics. | | default.t | | | | 07:EXCHANGE [UNPARTITIONED] | | | | | 04:NESTED LOOP JOIN [CROSS JOIN, BROADCAST] | | | | | |--06:EXCHANGE [BROADCAST] | | | | | | | 03:HASH JOIN [LEFT OUTER JOIN, BROADCAST] | | | | hash predicates: a.c1 = b.c2 | | | | other predicates: b.c2 IS NULL <--- Correct assignment. | | | | | | | |--05:EXCHANGE [BROADCAST] | | | | | | | | | 02:SCAN HDFS [default.t b] | | | | partitions=1/1 files=1 size=4B | | | | | | | 01:SCAN HDFS [default.t a] | | | partitions=1/1 files=1 size=4B | | | | | 00:SCAN HDFS [default.t t1] | | partitions=1/1 files=1 size=4B | +------------------------------------------------------------------------------------+