Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22227

Tez bucket pruning produces wrong result with shared work optimization

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0-alpha-1
    • Query Planning
    • None

    Description

      Reproducer

      set hive.tez.bucket.pruning=true;
      set hive.optimize.shared.work=true;
      
      CREATE TABLE srcbucket_mapjoin_n16(key int, value string) partitioned by (ds string) CLUSTERED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE;
      CREATE TABLE tab_part_n10 (key int, value string) PARTITIONED BY(ds STRING) CLUSTERED BY (key) SORTED BY (key) INTO 4 BUCKETS STORED AS ORCFILE;
      CREATE TABLE srcbucket_mapjoin_part_n17 (key int, value string) partitioned by (ds string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE;
      
      load data local inpath '$HIVE_SRC/data/files/bmj/000000_0' INTO TABLE srcbucket_mapjoin_n16 partition(ds='2008-04-08');
      load data local inpath '.$HIVE_SRC/data/files/bmj1/000001_0' INTO TABLE srcbucket_mapjoin_n16 partition(ds='2008-04-08');
      
      load data local inpath '$HIVE_SRC/data/files/bmj/000000_0' INTO TABLE srcbucket_mapjoin_part_n17 partition(ds='2008-04-08');
      load data local inpath '$HIVE_SRC/data/files/bmj/000001_0' INTO TABLE srcbucket_mapjoin_part_n17 partition(ds='2008-04-08');
      load data local inpath '$HIVE_SRC/data/files/bmj/000002_0' INTO TABLE srcbucket_mapjoin_part_n17 partition(ds='2008-04-08');
      
      
      set hive.optimize.bucketingsorting=false;
      insert overwrite table tab_part_n10 partition (ds='2008-04-08')
      select key,value from srcbucket_mapjoin_part_n17;
      
      CREATE TABLE tab_n9(key int, value string) PARTITIONED BY(ds STRING) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS ORCFILE;
      insert overwrite table tab_n9 partition (ds='2008-04-08')
      select key,value from srcbucket_mapjoin_n16;
      
      select * from
      (select * from tab_n9 where tab_n9.key = 0)a
      join
      (select * from tab_part_n10 where tab_part_n10.key = 98)b full outer join tab_part_n10 c on a.key = b.key and b.key = c.key
      order by 1,2,3,4,5,6,7,8,9;
      

      Attachments

        1. HIVE-22227.1.patch
          2 kB
          Vineet Garg

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            vgarg Vineet Garg Assign to me
            vgarg Vineet Garg
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment