Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14980

Minor compaction when triggered simultaniously on the same table/partition deletes data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Not A Bug
    • 2.1.0
    • None
    • Metastore, Transactions
    • None
    • Patch, Important

    Description

      I have two tables (TABLEA, TABLEB). If I manually trigger compaction after each INSERT into TABLEB from TABLEA, compactions are triggered on random metastore asynchronously and are stepping on each other which is causing the data to be deleted.

      Example here:
      TABLEA - has 10k rows.

      insert into mj.tableb select * from mj.tablea;
      alter table mj.tableb compact 'MINOR';
      insert into mj.tableb select * from mj.tablea;
      alter table mj.tableb compact 'MINOR';

      Once all the compactions are complete, I should ideally see 20k rows in TABLEB. But I see only 10k rows (Only the rows INSERTED before the last compaction persist, the old rows are deleted. I believe the old delta files are deleted).

      To further confirm the bug, if I do only one compaction after two inserts, I see 20k rows in TABLEB.

      Proposed Fix:
      I have identified the bug in the code, it requires an additional check in the org.apache.hadoop.hive.ql.txn.compactor.Worker class to check for any active compactions on the table/partition. I will 'share the details of the fix once I test it.

      Attachments

        Issue Links

          Activity

            People

              mahipal.jupalli Mahipal Jupalli
              mahipal.jupalli Mahipal Jupalli
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 96h
                  96h
                  Remaining:
                  Remaining Estimate - 96h
                  96h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified