Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.2
-
None
Description
where CTAS and IAS produce different results as a plain SELECT when using FULL JOINs and static value columns.
It's a regression. Seen in Impala 2.1.3+ (starting in CDH 5.3.5) but not earlier version.
Performing the same steps in Hive produce expected results. Also verified expected results in postgresql.
Example:
create table raw (id STRING, val STRING); insert into raw values ('user1','one'), ('user2','two'); select * from raw; +-------+-----+ | id | val | +-------+-----+ | user1 | one | | user2 | two | +-------+-----+ select a.id as a_id, b.id as b_id, one, two from ( select id, 'Y' as one from raw where val = 'one' group by id ) a full join ( select id, 'Y' as two from raw where val = 'two' group by id ) b on a.id = b.id; +-------+-------+------+------+ | a_id | b_id | one | two | +-------+-------+------+------+ | user1 | NULL | Y | NULL | | NULL | user2 | NULL | Y | +-------+-------+------+------+ create table test1 as select a.id as a_id, b.id as b_id, one, two from ( select id, 'Y' as one from raw where val = 'one' group by id ) a full join ( select id, 'Y' as two from raw where val = 'two' group by id ) b on a.id = b.id; select * from test1; +-------+-------+-----+-----+ | a_id | b_id | one | two | +-------+-------+-----+-----+ | user1 | NULL | Y | Y | | NULL | user2 | Y | Y | +-------+-------+-----+-----+
Testing using INSERT also produces the same incorrect result.
Workaround
To produce correct results, you can rewrite the with a UNION ALL to force the inline-view results to be materialized.
Before CTAS with incorrect results:
create table test1 as select a.id as a_id, b.id as b_id, one, two from (select id, 'Y' as one from raw where val = 'one' group by id) a full join (select id, 'Y' as two from raw where val = 'two' group by id) b on a.id = b.id;
After CTAS with correct results:
create table test1 as select a.id as a_id, b.id as b_id, one, two from (select id, 'Y' as one from raw where val = 'one' group by id union all <-- This is the workaround select NULL, NULL from raw where false) a <-- This is the workaround full join (select id, 'Y' as two from raw where val = 'two' group by id union all <-- This is the workaround select NULL, NULL from raw where false) b <-- This is the workaround on a.id = b.id;