Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.2
-
None
Description
Having clause predicate is not correctly applied in the presence of outer joins.
To reproduce the issue:
create table table_a_t (id int, value int); create table table_b_t (id int, ivc int); insert into table table_a_t values (1,1200); insert into table table_a_t values (2,300); insert into table table_a_t values (3,400); insert into table table_b_t values (1,3); insert into table table_b_t values (2,4); select ivc, sum( value ) from table_a_t as a left join table_b_t as b on a.id = b.id group by ivc having ivc is not null; Query: select ivc, sum( value ) from table_a_t as a left join table_b_t as b on a.id = b.id group by ivc having ivc is not null +------+------------+ | ivc | sum(value) | +------+------------+ | 3 | 1200 | | NULL | 400 | | 4 | 300 | +------+------------+
From the query plan (below), we see that the having clause predicate is pushed through the outer join and isn't applied at the join result.
+------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=2.19GB VCores=2 | | WARNING: The following tables are missing relevant table and/or column statistics. | | default.table_a_t, default.table_b_t | | | | 07:EXCHANGE [UNPARTITIONED] | | | | | 06:AGGREGATE [FINALIZE] | | | output: sum:merge(value) | | | group by: ivc | | | | | 05:EXCHANGE [HASH(ivc)] | | | | | 03:AGGREGATE | | | output: sum(value) | | | group by: ivc | | | | | 02:HASH JOIN [LEFT OUTER JOIN, BROADCAST] | | | hash predicates: a.id = b.id | | | | | |--04:EXCHANGE [BROADCAST] | | | | | | | 01:SCAN HDFS [default.table_b_t b] | | | partitions=1/1 files=2 size=8B | | | *predicates: b.ivc IS NOT NULL * | | | | | 00:SCAN HDFS [default.table_a_t a] | | partitions=1/1 files=3 size=19B | +------------------------------------------------------------------------------------+