Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15880

Allow insert overwrite and truncate table query to use auto.purge table property

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.3.0, 3.0.0
    • Component/s: None
    • Labels:
      None

      Description

      It seems inconsistent that auto.purge property is not considered when we do a INSERT OVERWRITE while it is when we do a DROP TABLE

      Drop table doesn't move table data to Trash when auto.purge is set to true

      > create table temp(col1 string, col2 string);
      No rows affected (0.064 seconds)
      > alter table temp set tblproperties('auto.purge'='true');
      No rows affected (0.083 seconds)
      > insert into temp values ('test', 'test'), ('test2', 'test2');
      No rows affected (25.473 seconds)
      
      # hdfs dfs -ls /user/hive/warehouse/temp
      Found 1 items
      -rwxrwxrwt   3 hive hive         22 2017-02-09 13:03 /user/hive/warehouse/temp/000000_0
      #
      
      > drop table temp;
      No rows affected (0.242 seconds)
      
      # hdfs dfs -ls /user/hive/warehouse/temp
      ls: `/user/hive/warehouse/temp': No such file or directory
      #
      # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
      #
      

      INSERT OVERWRITE query moves the table data to Trash even when auto.purge is set to true

      > create table temp(col1 string, col2 string);
      > alter table temp set tblproperties('auto.purge'='true');
      > insert into temp values ('test', 'test'), ('test2', 'test2');
      
      # hdfs dfs -ls /user/hive/warehouse/temp
      Found 1 items
      -rwxrwxrwt   3 hive hive         22 2017-02-09 13:07 /user/hive/warehouse/temp/000000_0
      #
      
      > insert overwrite table temp select * from dummy;
      
      # hdfs dfs -ls /user/hive/warehouse/temp
      Found 1 items
      -rwxrwxrwt   3 hive hive         26 2017-02-09 13:08 /user/hive/warehouse/temp/000000_0
      # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
      Found 1 items
      drwx------   - hive hive          0 2017-02-09 13:08 /user/hive/.Trash/Current/user/hive/warehouse/temp
      #
      

      While move operations are not very costly on HDFS it could be significant overhead on slow FileSystems like S3. This could improve the performance of INSERT OVERWRITE TABLE queries especially when there are large number of partitions on tables located on S3 should the user wish to set auto.purge property to true

      Similarly TRUNCATE TABLE query on a table with auto.purge property set true should not move the data to Trash

        Attachments

        1. HIVE-15880.01.patch
          7 kB
          Vihang Karajgaonkar
        2. HIVE-15880.02.patch
          22 kB
          Vihang Karajgaonkar
        3. HIVE-15880.03.patch
          25 kB
          Vihang Karajgaonkar
        4. HIVE-15880.04.patch
          23 kB
          Vihang Karajgaonkar
        5. HIVE-15880.05.patch
          35 kB
          Vihang Karajgaonkar
        6. HIVE-15880.06.patch
          36 kB
          Vihang Karajgaonkar

          Issue Links

            Activity

              People

              • Assignee:
                vihangk1 Vihang Karajgaonkar
                Reporter:
                vihangk1 Vihang Karajgaonkar
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: