Details
-
Sub-task
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
2.3.0
-
None
-
n/a
Description
the c'tor of VectorizedOrcAcidRowBatchReader has
// Clone readerOptions for deleteEvents. Reader.Options deleteEventReaderOptions = readerOptions.clone(); // Set the range on the deleteEventReaderOptions to 0 to INTEGER_MAX because // we always want to read all the delete delta files. deleteEventReaderOptions.range(0, Long.MAX_VALUE);
This is suboptimal since base and deltas are sorted by ROW__ID. So for each split if base we can find min/max ROW_ID and only load events from delta that are in [min,max] range. This will reduce the number of delete events we load in memory (to no more than there in the split).
When we support sorting on PK, the same should apply but we'd need to make sure to store PKs in ORC index
See OrcRawRecordMerger.discoverKeyBounds()
hive.acid.key.index in Orc footer has an index of ROW__IDs so we should know min/max easily for any file written by OrcRecordUpdater
Attachments
Attachments
Issue Links
- blocks
-
HIVE-20635 VectorizedOrcAcidRowBatchReader doesn't filter delete events for original files
- Closed
- causes
-
HIVE-22318 Java.io.exception:Two readers for
- Open
-
HIVE-23143 Transactions: PPD in Delete deltas is broken
- Open
- is blocked by
-
HIVE-18662 hive.acid.key.index is missing entries
- Closed
- is related to
-
HIVE-20694 Additional unit tests for VectorizedOrcAcidRowBatchReader min max key evaluation
- Closed
-
HIVE-17320 OrcRawRecordMerger.discoverKeyBounds logic can be simplified
- Open
-
HIVE-17284 remove OrcRecordUpdater.deleteEventIndexBuilder
- Resolved
-
HIVE-17458 VectorizedOrcAcidRowBatchReader doesn't handle 'original' files
- Closed
- relates to
-
HIVE-17231 ColumnizedDeleteEventRegistry.DeleteReaderValue optimization
- Closed
-
HIVE-14233 Improve vectorization for ACID by eliminating row-by-row stitching
- Resolved
-
HIVE-19985 ACID: Skip decoding the ROW__ID sections for read-only queries
- Closed
-
HIVE-20604 Minor compaction disables ORC column stats
- Closed
- links to