[HIVE-16812] VectorizedOrcAcidRowBatchReader doesn't filter delete events - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.3.0
Fix Version/s: 4.0.0-alpha-1
Component/s: Transactions
Labels:
None

Target Version/s:

3.0.0
Release Note:
n/a

Description

the c'tor of VectorizedOrcAcidRowBatchReader has

    // Clone readerOptions for deleteEvents.
    Reader.Options deleteEventReaderOptions = readerOptions.clone();
    // Set the range on the deleteEventReaderOptions to 0 to INTEGER_MAX because
    // we always want to read all the delete delta files.
    deleteEventReaderOptions.range(0, Long.MAX_VALUE);

This is suboptimal since base and deltas are sorted by ROW__ID. So for each split if base we can find min/max ROW_ID and only load events from delta that are in [min,max] range. This will reduce the number of delete events we load in memory (to no more than there in the split).

When we support sorting on PK, the same should apply but we'd need to make sure to store PKs in ORC index

See OrcRawRecordMerger.discoverKeyBounds()

hive.acid.key.index in Orc footer has an index of ROW__IDs so we should know min/max easily for any file written by OrcRecordUpdater

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-16812.02.patch
21/Sep/18 02:09
30 kB
Eugene Koifman
HIVE-16812.04.patch
25/Sep/18 01:12
61 kB
Eugene Koifman
HIVE-16812.05.patch
25/Sep/18 22:31
70 kB
Eugene Koifman
HIVE-16812.06.patch
26/Sep/18 23:17
74 kB
Eugene Koifman
HIVE-16812.07.patch
27/Sep/18 20:05
74 kB
Eugene Koifman

Issue Links

blocks

HIVE-20635 VectorizedOrcAcidRowBatchReader doesn't filter delete events for original files

Closed

causes

HIVE-22318 Java.io.exception:Two readers for

Open

HIVE-23143 Transactions: PPD in Delete deltas is broken

Open

is blocked by

HIVE-18662 hive.acid.key.index is missing entries

Closed

is related to

HIVE-20694 Additional unit tests for VectorizedOrcAcidRowBatchReader min max key evaluation

Closed

HIVE-17320 OrcRawRecordMerger.discoverKeyBounds logic can be simplified

Open

HIVE-17284 remove OrcRecordUpdater.deleteEventIndexBuilder

Resolved

HIVE-17458 VectorizedOrcAcidRowBatchReader doesn't handle 'original' files

Closed

relates to

HIVE-17231 ColumnizedDeleteEventRegistry.DeleteReaderValue optimization

Closed

HIVE-14233 Improve vectorization for ACID by eliminating row-by-row stitching

Resolved

HIVE-19985 ACID: Skip decoding the ROW__ID sections for read-only queries

Closed

HIVE-20604 Minor compaction disables ORC column stats

Closed

links to

Review Board

(3 is related to, 4 relates to, 1 links to)

Activity

People

Assignee:: Eugene Koifman

Reporter:: Eugene Koifman

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 02/Jun/17 01:45

Updated:: 17/Nov/22 08:54

Resolved:: 27/Sep/18 20:28