Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23809

Data loss occurs when using tez engine to join different bucketing_version tables

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.1.0
    • Fix Version/s: None
    • Component/s: Hive, Tez
    • Labels:

      Description

      Test case:
      create table table_a (a int, b string,c string);
      create table table_b (a int, b string,c string);
      insert into table_a values (11,'a','aa'),(22,'b','bb'),(33,'c','cc'),(44,'d','dd'),(5,'e','ee'),(6,'f','ff'),(7,'g','gg');
      insert into table_b values (11,'a','aa'),(22,'b','bb'),(33,'c','cc'),(44,'d','dd'),(5,'e','ee'),(6,'f','ff'),(7,'g','gg');
      alter table table_a set tblproperties ("bucketing_version"='1');
      alter table table_b set tblproperties ("bucketing_version"='2');
      Hivesql:
      set hive.auto.convert.join=false;
      set mapred.reduce.tasks=2;
      select ta.a as a_a, tb.b as b_b from table_a ta join table_b tb on(ta.a=tb.a);

      set hive.execution.engine=mr;
      ----------+

      a_a b_b

      ----------+

      5 e
      6 f
      7 g
      11 a
      22 b
      33 c
      44 d

      ----------+

      set hive.execution.engine=tez;
      ----------+

      a_a b_b

      ----------+

      6 f
      5 e
      11 a
      33 c

      ----------+

       
       
       
       
       
       

        Attachments

        1. HIVE-23809.1.patch
          2 kB
          ZhangQiDong

          Issue Links

            Activity

              People

              • Assignee:
                zhangqidong ZhangQiDong
                Reporter:
                zhangqidong ZhangQiDong
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - 12h
                  12h
                  Remaining:
                  Remaining Estimate - 12h
                  12h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified