[HIVE-1723] The result of left semi join is not correct - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: None
Labels:
None

Release Note:
This bug is resolved in Hive-1641

Description

In the test case semijoin.q, there is a query:
select /*+ mapjoin(b) */ a.key from t3 a left semi join t1 b on a.key = b.key sort by a.key;
I think this query will return a wrong result if table t1 is larger than 25000 different keys

To be simple, I tried a very similar query:
select /*+ mapjoin(b) */ a.key from test_semijoin a left semi join test_semijoin b on a.key = b.key sort by a.key;
The table of test_semijoin is like
0 0
1 1
2 2
3 3
4 4
5 5
... ...

... ....
25000 25000
25001 25001
... ....
... ....
25999 25999
26000 26000

So we can easily estimate the correct result of this query should be the same keys from table test_semijoin itsel.
Actually, the result is only part of that: only from 0 to 24544.

0
1
2
..
..
24543
24544

Attachments

Issue Links

is part of

HIVE-1641 add map joined table to distributed cache

Closed

Activity

People

Assignee:: Liyin Tang

Reporter:: Liyin Tang

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 17/Oct/10 21:35

Updated:: 07/Mar/11 18:55

Resolved:: 07/Mar/11 18:55