Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21387

Wrong result for UNION query with GROUP BY consisting of PK columns

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 4.0.0
    • Fix Version/s: 4.0.0
    • Component/s: Query Planning
    • Labels:
      None

      Description

      Reproducer

      create table t1(i int primary key disable rely, j int);
      insert into t1 values(1,100),(2,200);
      create table t2(i int primary key disable rely, j int);
      insert into t2 values(2,1000),(4,500);
      
      select i from (select i, j from t1 union all select i,j from t2) subq group by i,j;
      

      Expected Result

      2
      2
      4
      1
      

      Actual Result

      1
      2
      4
      

      CBO Plan

      HiveAggregate(group=[{0}])
        HiveProject(i=[$0], j=[$1])
          HiveUnion(all=[true])
            HiveProject(i=[$0], j=[$1])
              HiveTableScan(table=[[default, t1]], table:alias=[t1])
            HiveProject(i=[$0], j=[$1])
              HiveTableScan(table=[[default, t2]], table:alias=[t2])
      

      This is due to Group by reduction logic reducing keys when it shouldn't. Because of UNION relative cardinality of the group by keys are changed (they are not PK/UNIQUE anymore). Therefore we shouldn't be trying to reduce group by keys at all.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                vgarg Vineet Garg
                Reporter:
                vgarg Vineet Garg
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: