Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10996

Aggregation / Projection over Multi-Join Inner Query producing incorrect results

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.2.0, 1.1.0, 1.3.0, 2.0.0
    • Fix Version/s: 1.1.1, 2.0.0, 1.2.2
    • Component/s: Query Planning
    • Labels:
      None

      Description

      We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like a regression.

      The following query (Q1) produces no results:

      select s
      from (
        select last.*, action.st2, action.n
        from (
          select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp
          from (select * from purchase_history) purchase
          join (select * from cart_history) mevt
          on purchase.s = mevt.s
          where purchase.timestamp > mevt.timestamp
          group by purchase.s, purchase.timestamp
        ) last
        join (select * from events) action
        on last.s = action.s and last.last_stage_timestamp = action.timestamp
      ) list;
      
      

      While this one (Q2) does produce results :

      select *
      from (
        select last.*, action.st2, action.n
        from (
          select purchase.s, purchase.timestamp, max (mevt.timestamp) as last_stage_timestamp
          from (select * from purchase_history) purchase
          join (select * from cart_history) mevt
          on purchase.s = mevt.s
          where purchase.timestamp > mevt.timestamp
          group by purchase.s, purchase.timestamp
        ) last
        join (select * from events) action
        on last.s = action.s and last.last_stage_timestamp = action.timestamp
      ) list;
      1	21	20	Bob	1234
      1	31	30	Bob	1234
      3	51	50	Jeff	1234
      
      

      The setup to test this is:

      create table purchase_history (s string, product string, price double, timestamp int);
      insert into purchase_history values ('1', 'Belt', 20.00, 21);
      insert into purchase_history values ('1', 'Socks', 3.50, 31);
      insert into purchase_history values ('3', 'Belt', 20.00, 51);
      insert into purchase_history values ('4', 'Shirt', 15.50, 59);
      
      create table cart_history (s string, cart_id int, timestamp int);
      insert into cart_history values ('1', 1, 10);
      insert into cart_history values ('1', 2, 20);
      insert into cart_history values ('1', 3, 30);
      insert into cart_history values ('1', 4, 40);
      insert into cart_history values ('3', 5, 50);
      insert into cart_history values ('4', 6, 60);
      
      create table events (s string, st2 string, n int, timestamp int);
      insert into events values ('1', 'Bob', 1234, 20);
      insert into events values ('1', 'Bob', 1234, 30);
      insert into events values ('1', 'Bob', 1234, 25);
      insert into events values ('2', 'Sam', 1234, 30);
      insert into events values ('3', 'Jeff', 1234, 50);
      insert into events values ('4', 'Ted', 1234, 60);
      
      

      I realize select * and select s are not all that interesting in this context but what lead us to this issue was select count(distinct s) was not returning results. The above queries are the simplified queries that produce the issue.

      I will note that if I convert the inner join to a table and select from that the issue does not appear.

      Update: Found that turning off hive.optimize.remove.identity.project fixes this issue. This optimization was introduced in https://issues.apache.org/jira/browse/HIVE-8435

        Attachments

        1. HIVE-10996.09.patch
          78 kB
          Jesus Camacho Rodriguez
        2. HIVE-10996.08.patch
          77 kB
          Jesus Camacho Rodriguez
        3. HIVE-10996.07.patch
          75 kB
          Jesus Camacho Rodriguez
        4. HIVE-10996.06.patch
          76 kB
          Jesus Camacho Rodriguez
        5. HIVE-10996.05.patch
          73 kB
          Jesus Camacho Rodriguez
        6. HIVE-10996.04.patch
          72 kB
          Jesus Camacho Rodriguez
        7. HIVE-10996.03.patch
          72 kB
          Jesus Camacho Rodriguez
        8. HIVE-10996.02.patch
          40 kB
          Jesus Camacho Rodriguez
        9. HIVE-10996.01.patch
          38 kB
          Jesus Camacho Rodriguez
        10. HIVE-10996.patch
          43 kB
          Jesus Camacho Rodriguez
        11. explain_q2.txt
          7 kB
          Gautam Kowshik
        12. explain_q1.txt
          7 kB
          Gautam Kowshik

          Issue Links

            Activity

              People

              • Assignee:
                jcamachorodriguez Jesus Camacho Rodriguez
                Reporter:
                gkowshik Gautam Kowshik
              • Votes:
                1 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: