Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3734

Static partition DML create duplicate files and records

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 0.10.0
    • None
    • Query Processor
    • None

    Description

      Static DML create duplicate files and record.

      Given the following test case, hive will return 2 records:
      484 val_484
      484 val_484

      but srcpart returns one record:
      484 val_484

      If you look at file system, DML generates duplicate file with the same content:
      rw-rr- 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 000000_0
      -rwxr-xr-x 1 gang THEFACEBOOK\Domain Users 5812 Nov 21 17:55 000001_0

      Test Case
      ===
      set hive.mapred.supports.subdirectories=true;
      set hive.exec.dynamic.partition=true;
      set hive.exec.dynamic.partition.mode=nonstrict;
      set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
      set hive.merge.mapfiles=false;
      set hive.merge.mapredfiles=false;
      set mapred.input.dir.recursive=true;

      create table testtable (key String, value String) partitioned by (ds String, hr String) ;

      explain extended
      insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08';
      insert overwrite table testtable partition (ds='2008-04-08', hr='11') select key, value from srcpart where ds='2008-04-08';

      desc formatted testtable partition (ds='2008-04-08', hr='11');

      select count(1) from srcpart where ds='2008-04-08';
      select count(1) from testtable where ds='2008-04-08';

      select key, value from srcpart where ds='2008-04-08' and hr='11' and key = "484";
      explain extended
      select key, value from testtable where ds='2008-04-08' and hr='11' and key = "484";
      select key, value from testtable where ds='2008-04-08' and hr='11' and key = "484";
      ===

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            gangtimliu Gang Tim Liu Assign to me
            gangtimliu Gang Tim Liu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment