[HIVE-870] Implement LEFT SEMI JOIN - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.5.0
Component/s: Query Processor
Labels:
None

Hadoop Flags:

Reviewed

Description

Semi-join is an efficient way to unnest an IN/EXISTS subquery. For example,

select *
from A
where A.id IN
(select id
from B
where B.date> '2009-10-01');

returns from A whose ID is in the set of IDs found in B, whose date is greater than a certain date. This query can be unnested using a INNER join or LEFT OUTER JOIN, but we need to deduplicate the IDs returned by the subquery on table B. The semantics of LEFT SEMI JOIN is that as long as there is ANY row in the right-hand table that matches the join key, the left-hand table row will be emitted as a result w/o necessarily looking further in the right-hand table for further matches. This is exactly the semantics of the IN subquery.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Hive-870_2.patch
06/Nov/09 07:28
420 kB
Ning Zhang
Hive-870_3.patch
09/Nov/09 22:40
147 kB
Ning Zhang
Hive-870.patch
04/Nov/09 07:18
413 kB
Ning Zhang

Issue Links

relates to

HIVE-784 Support uncorrelated subqueries in the WHERE clause

Resolved

Activity

People

Assignee:: Ning Zhang

Reporter:: Ning Zhang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 05/Oct/09 23:49

Updated:: 17/Dec/11 00:05

Resolved:: 10/Nov/09 20:19