Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Query Processor
    • Labels:
      None

      Description

      This is from the Optimization of Nested SQl Queries Revisited paper: http://dl.acm.org/citation.cfm?id=38723

      Consider Part table having:

      PNum OrderOnHand
      -------- ------------------
      3          6
      10        1
      8          0
      

      Supply table having:

      PNum  Qty  
      3          4
      3          2
      10        1
      

      The query:

      select pnum
      from parts p
      where orderOnHand
           in (select count(*) from supply s
                where s.pnum = p.pnum
               )
      

      should return the row with PNum=8.
      But a transformation to a semi-join would eliminate this row, as there are no rows in supply table with PNum=8.

      AS shown in the paper the soln is to transform to:

      select pnum
      from parts p semijoin
              (select p1.pnum, count(*) as c
                from (select distinct pnum from parts) p1 join supply s
                where s.pnum = p1.pnum
               ) sq on p.pnum = sq.pnum and p.orderOnHand = sq.c
      

      The additional distinct query within the SubQuery is to handle duplicates in the outer query on the joining columns.

        Activity

        Hide
        julianhyde Julian Hyde added a comment -

        This is fixed in Calcite as part of CALCITE-365. It's worth re-evaluating when Hive picks up Calcite 1.12.

        By the way, it's ambiguous in Harish Butani's description above, but the query should return two rows (yes, including PNum=8). I confirmed on Postgres:

        > create table Part (PNum int, OrderOnHand int);
        CREATE TABLE
        > insert into Part values (3,6),(10,1),(8,0);
        INSERT 0 3
        > create table Supply (PNum int, Qty int);
        CREATE TABLE
        > insert into Supply values (3,4),(3,2),(10,1);
        INSERT 0 3
        > select pnum                                  
        from Part p
        where orderOnHand
             in (select count(*) from Supply s
                  where s.pnum = p.pnum
                 );
         pnum 
        ------
           10
            8
        (2 rows)
        
        Show
        julianhyde Julian Hyde added a comment - This is fixed in Calcite as part of CALCITE-365 . It's worth re-evaluating when Hive picks up Calcite 1.12. By the way, it's ambiguous in Harish Butani 's description above, but the query should return two rows (yes, including PNum=8). I confirmed on Postgres: > create table Part (PNum int, OrderOnHand int); CREATE TABLE > insert into Part values (3,6),(10,1),(8,0); INSERT 0 3 > create table Supply (PNum int, Qty int); CREATE TABLE > insert into Supply values (3,4),(3,2),(10,1); INSERT 0 3 > select pnum from Part p where orderOnHand in (select count(*) from Supply s where s.pnum = p.pnum ); pnum ------ 10 8 (2 rows)

          People

          • Assignee:
            Unassigned
            Reporter:
            rhbutani Harish Butani
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development