Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22661

Compaction fails on non bucketed table with data loaded inpath

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0-alpha-1
    • None
    • None

    Description

      Compaction cannot handle situations where:

      • data was ingested with LOAD DATA INPATH
      • this ingest method is run multiple times, and
        • with different number of files getting created in the delta directories

      Therefore, for file/dir structures such as:

      /warehouse/tablespace/managed/hive/comp3/delta_0000001_0000001_0000
      /warehouse/tablespace/managed/hive/comp3/delta_0000001_0000001_0000/000000_0
      /warehouse/tablespace/managed/hive/comp3/delta_0000001_0000001_0000/000001_0
      /warehouse/tablespace/managed/hive/comp3/delta_0000002_0000002_0000
      /warehouse/tablespace/managed/hive/comp3/delta_0000002_0000002_0000/000000_0
      /warehouse/tablespace/managed/hive/comp3/delta_0000002_0000002_0000/000001_0
      /warehouse/tablespace/managed/hive/comp3/delta_0000002_0000002_0000/000002_0 

      Although the table is not bucketed, bucket is calculated from the (raw) files' names. Compaction in the above case will fail on delta1-1 not having data for 'bucket' 2.

      Steps to repro using small dataset:

      set tez.grouping.min-size=8;
      set tez.grouping.max-size=8;
      set mapreduce.input.fileinputformat.split.minsize=8;
      set mapreduce.input.fileinputformat.split.minsize=8;
      
      create external table comp0 (a string);
      insert into comp0 values ("qwertyuiopasdfghjklzxcvbnm");
      insert into comp0 values ("qwertyuiopasdfghjklzxcvbnm");
      
      create external table comp1 stored as orc as select * from comp0;
      
      insert into comp0 values ("qwertyuiopasdfghjklzxcvbnm");
      create external table comp2 stored as orc as select * from comp0;
      
      create table comp3 (a string);
      load data inpath '/warehouse/tablespace/external/hive/comp1' into table comp3;
      load data inpath '/warehouse/tablespace/external/hive/comp2' into table comp3;

      Attachments

        1. HIVE-22661.2.patch
          8 kB
          Ádám Szita
        2. HIVE-22661.1.patch
          8 kB
          Ádám Szita
        3. HIVE-22661.0.patch
          8 kB
          Ádám Szita

        Activity

          People

            szita Ádám Szita
            szita Ádám Szita
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: