Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 0.3
-
None
Description
When using the right outer join, the result set is showing rows that are duplicates and with NULLs. For eg:
The left table looks like this (first column is the join_col)
A ---- 1
B ---- 2
C ---- 3
right table looks like (first column is the join_col)
A
B
D
E
The result:
A — 1
A — NULL
B — 2
B — NULL
D — NULL
E — NULL
The Impala query -
insert into table join_impala
select distinct b.master_id as master_id, b.master_tk as master_tk, a.master_num as tk_master_num from std_clnt_master_tk a right outer join prepare_master b on (a.master_tk = b.master_tk);
customer: "# of rows ~ 1.9 million (wrong)"
In hive the query used works fine:
insert into table join_hive
select distinct b.master_id as master_id, b.master_tk as master_tk, a.master_num as tk_master_num from std_clnt_master_tk a right outer join prepare_master b on (a.master_tk = b.master_tk);
customer: "# of rows ~ 1.8 million (correct answer, verified using an SQL database as well)"