Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Component/s: Query Processor
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Semi-join is an efficient way to unnest an IN/EXISTS subquery. For example,

      select *
      from A
      where A.id IN
      (select id
      from B
      where B.date> '2009-10-01');

      returns from A whose ID is in the set of IDs found in B, whose date is greater than a certain date. This query can be unnested using a INNER join or LEFT OUTER JOIN, but we need to deduplicate the IDs returned by the subquery on table B. The semantics of LEFT SEMI JOIN is that as long as there is ANY row in the right-hand table that matches the join key, the left-hand table row will be emitted as a result w/o necessarily looking further in the right-hand table for further matches. This is exactly the semantics of the IN subquery.

        Attachments

        1. Hive-870.patch
          413 kB
          Ning Zhang
        2. Hive-870_3.patch
          147 kB
          Ning Zhang
        3. Hive-870_2.patch
          420 kB
          Ning Zhang

          Issue Links

            Activity

              People

              • Assignee:
                nzhang Ning Zhang
                Reporter:
                nzhang Ning Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: