Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-4697

Subqueries with IN and NOT IN

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • Query Processor
    • None

    Description

      Functional Requirements
      • Support WHERE x IN (<column subquery>);
        • <column subquery> returns one column, any number of rows.
      • Support WHERE x NOT IN (<column subquery>);
      • Support same types of subqueries in HAVING.
        • E.g.
          SELECT key FROM t1 GROUP BY key 
          HAVING COUNT(value) IN (SELECT p FROM t2);  
      • Correlated subqueries not supported, for now at least
        • But still need to check for correlation, and bail if it occurs.
        • Correlated subquery:
          • A subquery that references a table that appears in a containing query (MySQL), thus requiring subquery evaluation to look outside its scope.
          • The subquery depends on the outer query for its values, so the subquery must be executed once for each row of the outer query. Also known as repeating Subqueries.
      Tasks
      • Rewrite IN (<column-subquery>) as a LEFT SEMI JOIN.
        • Not ready for public consumption. In particular, no check for correlated terms.
        • With test queries.
      • Add check for correlated terms, return informative error message.
      • Rewrite WHERE NOT IN (<column-subquery>) as a LEFT OUTER JOIN.
        • Return rows that don't match the right side
      • Rewrite subqueries in HAVING, using LEFT SEMI JOIN and LEFT OUTER JOIN as above.

      Attachments

        Issue Links

          Activity

            People

              mweaverh Matthew Weaver
              mweaverh Matthew Weaver
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 840h
                  840h
                  Remaining:
                  Remaining Estimate - 840h
                  840h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified