Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20699 Query based compactor for full CRUD Acid tables
  3. HIVE-21266

Don't run cleaner if compaction is skipped (issue with single delta file)

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0-alpha-1
    • Transactions
    • None

    Description

      https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java#L353-L357

       

      if ((deltaCount + (dir.getBaseDirectory() == null ? 0 : 1)) + origCount <= 1) {
            LOG.debug("Not compacting {}; current base is {} and there are {} deltas and {} originals", sd.getLocation(), dir
                .getBaseDirectory(), deltaCount, origCount);
            return;
          }
       

      Is problematic.
      Suppose you have 1 delta file from streaming ingest: delta_11_20 where txnid:13 was aborted. The code above will not rewrite the delta (which drops anything that belongs to the aborted txn) and transition the compaction to "ready_for_cleaning" state which will drop the metadata about the aborted txn in markCleaned(). Now aborted data will come back as committed.

      Attachments

        1. HIVE-21266.01.patch
          11 kB
          Karen Coppage
        2. HIVE-21266.02.patch
          13 kB
          Karen Coppage
        3. HIVE-21266.02.patch
          13 kB
          Karen Coppage
        4. HIVE-21266.03.patch
          15 kB
          Karen Coppage
        5. HIVE-21266.04.patch
          16 kB
          Karen Coppage
        6. HIVE-21266.04.patch
          16 kB
          Karen Coppage
        7. HIVE-21266.04.patch
          16 kB
          Karen Coppage

        Issue Links

          Activity

            People

              klcopp Karen Coppage
              ekoifman Eugene Koifman
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: