Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23809

Data loss occurs when using tez engine to join different bucketing_version tables

Log workAgile BoardRank to TopRank to BottomAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.1.0
    • Fix Version/s: None
    • Component/s: Hive, Tez
    • Labels:

      Description

      Test case:
      create table table_a (a int, b string,c string);
      create table table_b (a int, b string,c string);
      insert into table_a values (11,'a','aa'),(22,'b','bb'),(33,'c','cc'),(44,'d','dd'),(5,'e','ee'),(6,'f','ff'),(7,'g','gg');
      insert into table_b values (11,'a','aa'),(22,'b','bb'),(33,'c','cc'),(44,'d','dd'),(5,'e','ee'),(6,'f','ff'),(7,'g','gg');
      alter table table_a set tblproperties ("bucketing_version"='1');
      alter table table_b set tblproperties ("bucketing_version"='2');
      Hivesql:
      set hive.auto.convert.join=false;
      set mapred.reduce.tasks=2;
      select ta.a as a_a, tb.b as b_b from table_a ta join table_b tb on(ta.a=tb.a);

      set hive.execution.engine=mr;
      ----------+

      a_a b_b

      ----------+

      5 e
      6 f
      7 g
      11 a
      22 b
      33 c
      44 d

      ----------+

      set hive.execution.engine=tez;
      ----------+

      a_a b_b

      ----------+

      6 f
      5 e
      11 a
      33 c

      ----------+

       
       
       
       
       
       

        Attachments

        1. HIVE-23809.1.patch
          2 kB
          ZhangQiDong

        Issue Links

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

              Dates

              • Created:
                Updated:

              Time Tracking

              Estimated:
              Original Estimate - 12h
              12h
              Remaining:
              Remaining Estimate - 12h
              12h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Issue deployment