Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24235

Drop and recreate table during MR compaction leaves behind base/delta directory

    XMLWordPrintableJSON

Details

    Description

      If a table is dropped and recreated during MR compaction, the table directory and a base (or delta, if minor compaction) directory could be created, with or without data, while the table "does not exist".

      E.g.

      create table c (i int) stored as orc tblproperties ("NO_AUTO_COMPACTION"="true", "transactional"="true");
      insert into c values (9);
      insert into c values (9);
      alter table c compact 'major';
      
      While compaction job is running: {
      drop table c;
      create table c (i int) stored as orc tblproperties ("NO_AUTO_COMPACTION"="true", "transactional"="true");
      }
      

      The table directory should be empty, but table directory could look like this after the job is finished:

      Oct  6 14:23 c/base_0000002_v0000101/._orc_acid_version.crc
      Oct  6 14:23 c/base_0000002_v0000101/.bucket_00000.crc
      Oct  6 14:23 c/base_0000002_v0000101/_orc_acid_version
      Oct  6 14:23 c/base_0000002_v0000101/bucket_00000
      

      or perhaps just: 

      Oct  6 14:23 c/base_0000002_v0000101/._orc_acid_version.crc
      Oct  6 14:23 c/base_0000002_v0000101/_orc_acid_version
      

      Insert another row and you have:

      Oct  6 14:33 base_0000002_v0000101/
      Oct  6 14:33 base_0000002_v0000101/._orc_acid_version.crc
      Oct  6 14:33 base_0000002_v0000101/.bucket_00000.crc
      Oct  6 14:33 base_0000002_v0000101/_orc_acid_version
      Oct  6 14:33 base_0000002_v0000101/bucket_00000
      Oct  6 14:35 delta_0000001_0000001_0000/._orc_acid_version.crc
      Oct  6 14:35 delta_0000001_0000001_0000/.bucket_00000_0.crc
      Oct  6 14:35 delta_0000001_0000001_0000/_orc_acid_version
      Oct  6 14:35 delta_0000001_0000001_0000/bucket_00000_0
      

      Selecting from the table will result in this error because the highest valid writeId for this table is 1:

      thrift.ThriftCLIService: Error fetching results: 
      org.apache.hive.service.cli.HiveSQLException: Unable to get the next row set
              at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:482) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
      ...
      Caused by: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: java.io.IOException: Not enough history available for (1,x).  Oldest available base: .../warehouse/b/base_0000004_v0000092
      

      Solution: Resolve the table again after compaction is finished; compare the id with the table id from when compaction began. If the ids do not match, abort the compaction's transaction.

      Attachments

        Activity

          People

            klcopp Karen Coppage
            klcopp Karen Coppage
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h
                1h