Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.13.1
-
None
Description
I have two queries which should be equal (I only swap two join conditions) but they are not. They are simplest queries I could produce to reproduce bug.
I have two simple tables:
desc kgorlo_comm;
col_name | data_type | comment |
id | bigint | |
dest_id | bigint |
desc kgorlo_log;
col_name | data_type | comment |
id | bigint | |
dest_id | bigint | |
tstamp | bigint |
With data:
select * from kgorlo_comm;
kgorlo_comm.id | kgorlo_comm.dest_id |
1 | 2 |
2 | 1 |
1 | 3 |
2 | 3 |
3 | 5 |
4 | 5 |
select * from kgorlo_log;
kgorlo_log.id | kgorlo_log.dest_id | kgorlo_log.tstamp |
1 | 2 | 0 |
1 | 3 | 0 |
1 | 5 | 0 |
3 | 1 | 0 |
And when I run this query (query no. 1):
select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.dest_id=log.id and com2.id=log.dest_id;
I get result (which is correct):
log.id | log.dest_id | com1.msgs | com2.msgs |
1 | 2 | 1 | 1 |
1 | 3 | 1 | NULL |
1 | 5 | NULL | NULL |
3 | 1 | NULL | 1 |
But when I run second query (query no. 2):
select log.id, log.dest_id, com1.msgs, com2.msgs from kgorlo_log log
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com1 on com1.id=log.id and com1.dest_id=log.dest_id
left outer join (select id, dest_id, count( * ) as msgs from kgorlo_comm group by id, dest_id)com2 on com2.id=log.dest_id and com2.dest_id=log.id;
I get different (and bad, in my opinion) result:
log.id | log.dest_id | com1.msgs | com2.msgs |
1 | 2 | 1 | 1 |
1 | 3 | 1 | 1 |
1 | 5 | NULL | NULL |
3 | 1 | NULL | NULL |
Query no. 1 and query no. 2 are different in only one place, it is second join condition:
bf. com2.dest_id=log.id and com2.id=log.dest_id
vs
bf. com2.id=log.dest_id and com2.dest_id=log.id
which in my opinion are equal.
Explains for both queries are of course slightly different (columns are swapped) and they are here:
https://gist.github.com/kgs/399ad7ca2c481bd2c018 (query no. 1, good)
https://gist.github.com/kgs/bfb3216f0f1fbc28037e (query no. 2, bad)
Attachments
Issue Links
- duplicates
-
HIVE-8298 Incorrect results for n-way join when join expressions are not in same order across joins
-
- Closed
-