Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23143

Transactions: PPD in Delete deltas is broken

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Transactions
    • Labels:
      None

      Description

      The optimization introduced in HIVE-16812 seems broken. PPD is not happening for delete deltas, and in fact, also causes wrong results if data column names conflict with ACID ROW__ID column names (bucket, originalTransactionId etc).

      This seems to be happening because after ORC-491, all PPD happens in data columns only for ACID orc files, so the filters for delete PPD never get applied on metadata columns and try to apply to data columns instead. And when the data columns have a column name (like "bucket" in the below example), it returns wrong results. 

      Steps to repro:

      set hive.fetch.task.conversion=none;
      set hive.query.results.cache.enabled=false;
      create table test(a int, bucket int) stored as orc tblproperties("transactional"="true");
      insert into table test values (1, 1111), (2, 2222), (3, 3333);
      delete from test where a = 2;
      select * from test; //Will return the deleted row as well
      set hive.txn.filter.delete.events=false;
      select * from test; //Correct results returned. Will not return the deleted row
      

      cc Peter Vary Gopal Vijayaraghavan

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                asomani Abhishek Somani
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: