Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.5.0, Impala 2.6.0, Impala 2.7.0
Description
For queries that have deeply nested unions and WHERE-clause subqueries, UNION ALL may sometimes be incorrectly changed to a UNION DISTINCT after subquery rewriting. This modification may lead to incorrect results.
Example:
select * from functional.alltypestiny where exists (select 1 from functional.alltypes where int_col < 10) union all (select * from functional.alltypestiny where year=2009 and month=1 union all select * from functional.alltypestiny where year=2009 and month=2) PLAN-ROOT SINK | 12:EXCHANGE [UNPARTITIONED] | 06:UNION | |--07:SCAN HDFS [functional.alltypestiny] | partitions=1/4 files=1 size=115B | 11:AGGREGATE [FINALIZE] | group by: id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col, date_string_col, string_col, timestamp_col, year, month | 10:EXCHANGE [HASH(id,bool_col,tinyint_col,smallint_col,int_col,bigint_col,float_col,double_col,date_string_col,string_col,timestamp_col,year,month)] | 05:AGGREGATE [STREAMING] | group by: id, bool_col, tinyint_col, smallint_col, int_col, bigint_col, float_col, double_col, date_string_col, string_col, timestamp_col, year, month | 00:UNION | |--04:SCAN HDFS [functional.alltypestiny] | partitions=1/4 files=1 size=115B | 03:NESTED LOOP JOIN [LEFT SEMI JOIN, BROADCAST] | |--09:EXCHANGE [BROADCAST] | | | 08:EXCHANGE [UNPARTITIONED] | | limit: 1 | | | 02:SCAN HDFS [functional.alltypes] | partitions=24/24 files=24 size=478.45KB | predicates: int_col < 10 | limit: 1 | 01:SCAN HDFS [functional.alltypestiny] partitions=4/4 files=4 size=460B
Notice the aggregation above which performs the DISTINCT.
Workaround
Manually unnest the union as follows to get a correct plan and result:
select * from functional.alltypestiny where exists (select 1 from functional.alltypes where int_col < 10) union all select * from functional.alltypestiny where year=2009 and month=1 union all select * from functional.alltypestiny where year=2009 and month=2