Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23809

Data loss occurs when using tez engine to join different bucketing_version tables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • Hive, Tez

    Description

      Test case:
      create table table_a (a int, b string,c string);
      create table table_b (a int, b string,c string);
      insert into table_a values (11,'a','aa'),(22,'b','bb'),(33,'c','cc'),(44,'d','dd'),(5,'e','ee'),(6,'f','ff'),(7,'g','gg');
      insert into table_b values (11,'a','aa'),(22,'b','bb'),(33,'c','cc'),(44,'d','dd'),(5,'e','ee'),(6,'f','ff'),(7,'g','gg');
      alter table table_a set tblproperties ("bucketing_version"='1');
      alter table table_b set tblproperties ("bucketing_version"='2');
      Hivesql:
      set hive.auto.convert.join=false;
      set mapred.reduce.tasks=2;
      select ta.a as a_a, tb.b as b_b from table_a ta join table_b tb on(ta.a=tb.a);

      set hive.execution.engine=mr;
      ----------+

      a_a b_b

      ----------+

      5 e
      6 f
      7 g
      11 a
      22 b
      33 c
      44 d

      ----------+

      set hive.execution.engine=tez;
      ----------+

      a_a b_b

      ----------+

      6 f
      5 e
      11 a
      33 c

      ----------+

       
       
       
       
       
       

      Attachments

        1. HIVE-23809.1.patch
          2 kB
          ZhangQiDong

        Issue Links

          Activity

            People

              zhangqidong ZhangQiDong
              zhangqidong ZhangQiDong
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - 12h
                  12h
                  Remaining:
                  Remaining Estimate - 12h
                  12h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified