Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5873

SubQuery: In subquery Count Bug

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None

      Description

      This is from the Optimization of Nested SQl Queries Revisited paper: http://dl.acm.org/citation.cfm?id=38723

      Consider Part table having:

      PNum OrderOnHand
      -------- ------------------
      3          6
      10        1
      8          0
      

      Supply table having:

      PNum  Qty  
      3          4
      3          2
      10        1
      

      The query:

      select pnum
      from parts p
      where orderOnHand
           in (select count(*) from supply s
                where s.pnum = p.pnum
               )
      

      should return the row with PNum=8.
      But a transformation to a semi-join would eliminate this row, as there are no rows in supply table with PNum=8.

      AS shown in the paper the soln is to transform to:

      select pnum
      from parts p semijoin
              (select p1.pnum, count(*) as c
                from (select distinct pnum from parts) p1 join supply s
                where s.pnum = p1.pnum
               ) sq on p.pnum = sq.pnum and p.orderOnHand = sq.c
      

      The additional distinct query within the SubQuery is to handle duplicates in the outer query on the joining columns.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rhbutani Harish Butani
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: