Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-5873

SubQuery: In subquery Count Bug

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Query Processor
    • None

    Description

      This is from the Optimization of Nested SQl Queries Revisited paper: http://dl.acm.org/citation.cfm?id=38723

      Consider Part table having:

      PNum OrderOnHand
      -------- ------------------
      3          6
      10        1
      8          0
      

      Supply table having:

      PNum  Qty  
      3          4
      3          2
      10        1
      

      The query:

      select pnum
      from parts p
      where orderOnHand
           in (select count(*) from supply s
                where s.pnum = p.pnum
               )
      

      should return the row with PNum=8.
      But a transformation to a semi-join would eliminate this row, as there are no rows in supply table with PNum=8.

      AS shown in the paper the soln is to transform to:

      select pnum
      from parts p semijoin
              (select p1.pnum, count(*) as c
                from (select distinct pnum from parts) p1 join supply s
                where s.pnum = p1.pnum
               ) sq on p.pnum = sq.pnum and p.orderOnHand = sq.c
      

      The additional distinct query within the SubQuery is to handle duplicates in the outer query on the joining columns.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rhbutani Harish Butani
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: