Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24235

Drop and recreate table during MR compaction leaves behind base/delta directory

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0.0
    • Component/s: None

      Description

      If a table is dropped and recreated during MR compaction, the table directory and a base (or delta, if minor compaction) directory could be created, with or without data, while the table "does not exist".

      E.g.

      create table c (i int) stored as orc tblproperties ("NO_AUTO_COMPACTION"="true", "transactional"="true");
      insert into c values (9);
      insert into c values (9);
      alter table c compact 'major';
      
      While compaction job is running: {
      drop table c;
      create table c (i int) stored as orc tblproperties ("NO_AUTO_COMPACTION"="true", "transactional"="true");
      }
      

      The table directory should be empty, but table directory could look like this after the job is finished:

      Oct  6 14:23 c/base_0000002_v0000101/._orc_acid_version.crc
      Oct  6 14:23 c/base_0000002_v0000101/.bucket_00000.crc
      Oct  6 14:23 c/base_0000002_v0000101/_orc_acid_version
      Oct  6 14:23 c/base_0000002_v0000101/bucket_00000
      

      or perhaps just: 

      Oct  6 14:23 c/base_0000002_v0000101/._orc_acid_version.crc
      Oct  6 14:23 c/base_0000002_v0000101/_orc_acid_version
      

      Insert another row and you have:

      Oct  6 14:33 base_0000002_v0000101/
      Oct  6 14:33 base_0000002_v0000101/._orc_acid_version.crc
      Oct  6 14:33 base_0000002_v0000101/.bucket_00000.crc
      Oct  6 14:33 base_0000002_v0000101/_orc_acid_version
      Oct  6 14:33 base_0000002_v0000101/bucket_00000
      Oct  6 14:35 delta_0000001_0000001_0000/._orc_acid_version.crc
      Oct  6 14:35 delta_0000001_0000001_0000/.bucket_00000_0.crc
      Oct  6 14:35 delta_0000001_0000001_0000/_orc_acid_version
      Oct  6 14:35 delta_0000001_0000001_0000/bucket_00000_0
      

      Selecting from the table will result in this error because the highest valid writeId for this table is 1:

      thrift.ThriftCLIService: Error fetching results: 
      org.apache.hive.service.cli.HiveSQLException: Unable to get the next row set
              at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:482) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
      ...
      Caused by: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: java.io.IOException: Not enough history available for (1,x).  Oldest available base: .../warehouse/b/base_0000004_v0000092
      

      Solution: Resolve the table again after compaction is finished; compare the id with the table id from when compaction began. If the ids do not match, abort the compaction's transaction.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                klcopp Karen Coppage
                Reporter:
                klcopp Karen Coppage
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h