Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The optimization introduced in HIVE-16812 seems broken. PPD is not happening for delete deltas, and in fact, also causes wrong results if data column names conflict with ACID ROW__ID column names (bucket, originalTransactionId etc).
This seems to be happening because after ORC-491, all PPD happens in data columns only for ACID orc files, so the filters for delete PPD never get applied on metadata columns and try to apply to data columns instead. And when the data columns have a column name (like "bucket" in the below example), it returns wrong results.
Steps to repro:
set hive.fetch.task.conversion=none; set hive.query.results.cache.enabled=false; create table test(a int, bucket int) stored as orc tblproperties("transactional"="true"); insert into table test values (1, 1111), (2, 2222), (3, 3333); delete from test where a = 2; select * from test; //Will return the deleted row as well set hive.txn.filter.delete.events=false; select * from test; //Correct results returned. Will not return the deleted row
Attachments
Issue Links
- is caused by
-
HIVE-16812 VectorizedOrcAcidRowBatchReader doesn't filter delete events
- Closed
-
ORC-491 PPD: Column name lookups need to look a struct deeper for ACID
- Closed