Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Duplicate
-
0.13.0
-
None
-
None
-
Linux 2.6.32-279.19.1.el6.x86_64
Description
Join between tables with different join columns from main table yielding wrong results in hive.
Changing the order of the joins between main table and other tables is producing different results.
Please see below for the steps to reproduce the issue:
1. Create tables as follows:
create table p(ck string, email string);
create table a1(ck string, flag string);
create table a2(email string, flag string);
create table a3(ck string, flag string);
2. Load data into the tables as follows:
P
ck | |
---|---|
10 | e10 |
20 | e20 |
30 | e30 |
40 | e40 |
A1
ck | flag |
---|---|
10 | N |
20 | Y |
30 | Y |
40 | Y |
A2
flag | |
---|---|
e10 | Y |
e20 | N |
e30 | Y |
e40 | Y |
A3
ck | flag |
---|---|
10 | Y |
20 | Y |
30 | N |
40 | Y |
3. Good query:
select p.ck
from p
left outer join a1 on p.ck = a1.ck
left outer join a3 on p.ck = a3.ck
left outer join a2 on p.email = a2.email
where a1.flag = 'Y'
and a3.flag = 'Y'
and a2.flag = 'Y'
;
and results are
40
4. Bad query
select p.ck
from p
left outer join a1 on p.ck = a1.ck
left outer join a2 on p.email = a2.email
left outer join a3 on p.ck = a3.ck
where a1.flag = 'Y'
and a2.flag = 'Y'
and a3.flag = 'Y'
;
Producing results as:
30
40
Attachments
Issue Links
- duplicates
-
HIVE-10841 [WHERE col is not null] does not work sometimes for queries with many JOIN statements
- Closed