[HIVE-17458] VectorizedOrcAcidRowBatchReader doesn't handle 'original' files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 3.0.0
Component/s: Transactions
Labels:
None

Target Version/s:

3.0.0

Description

VectorizedOrcAcidRowBatchReader will not be used for original files. This will likely look like a perf regression when converting a table from non-acid to acid until it runs through a major compaction.

With Load Data support, if large files are added via Load Data, the read ops will not vectorize until major compaction.

There is no reason why this should be the case. Just like OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other files in the logical tranche/bucket and calculate the offset for the RowBatch of the split. (Presumably getRecordReader().getRowNumber() works the same in vector mode).

In this case we don't even need OrcSplit.isOriginal() - the reader can infer it from file path... which in particular simplifies OrcInputFormat.determineSplitStrategies()

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-17458.01.patch
18/Oct/17 01:08
13 kB
Eugene Koifman
HIVE-17458.02.patch
19/Oct/17 00:23
19 kB
Eugene Koifman
HIVE-17458.03.patch
20/Oct/17 02:05
24 kB
Eugene Koifman
HIVE-17458.04.patch
21/Oct/17 00:29
35 kB
Eugene Koifman
HIVE-17458.05.patch
23/Oct/17 19:32
37 kB
Eugene Koifman
HIVE-17458.06.patch
24/Oct/17 05:48
37 kB
Eugene Koifman
HIVE-17458.07.patch
24/Oct/17 21:10
88 kB
Eugene Koifman
HIVE-17458.07.patch
24/Oct/17 19:41
45 kB
Eugene Koifman
HIVE-17458.08.patch
25/Oct/17 02:25
105 kB
Eugene Koifman
HIVE-17458.09.patch
25/Oct/17 15:26
105 kB
Eugene Koifman
HIVE-17458.10.patch
27/Oct/17 01:46
140 kB
Eugene Koifman
HIVE-17458.11.patch
27/Oct/17 21:15
140 kB
Eugene Koifman
HIVE-17458.12.patch
30/Oct/17 22:13
145 kB
Eugene Koifman
HIVE-17458.12.patch
30/Oct/17 21:58
145 kB
Eugene Koifman
HIVE-17458.13.patch
31/Oct/17 22:08
122 kB
Eugene Koifman
HIVE-17458.14.patch
01/Nov/17 14:24
122 kB
Eugene Koifman
HIVE-17458.15.patch
01/Nov/17 18:22
123 kB
Eugene Koifman
HIVE-17458.16.patch
03/Nov/17 17:02
125 kB
Eugene Koifman

Issue Links

is blocked by

HIVE-17923 'cluster by' should not be needed for a bucketed table

Resolved

HIVE-12631 LLAP IO: support ORC ACID tables

Closed

HIVE-14233 Improve vectorization for ACID by eliminating row-by-row stitching

Resolved

is related to

HIVE-14878 integrate MM tables into ACID: add separate ACID type

Resolved

HIVE-17854 LlapRecordReader should have getRowNumber() like org.apache.orc.RecordReader

Open

relates to

HIVE-16812 VectorizedOrcAcidRowBatchReader doesn't filter delete events

Closed

HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions

Resolved

HIVE-18045 can VectorizedOrcAcidRowBatchReader be used all the time

Open

links to

Review Board

(3 relates to, 1 links to)

Sub-Tasks

1.	can VectorizedOrcAcidRowReader be removed once HIVE-17458 is done?	Closed	Eugene Koifman
2.	Enable VectorizedOrcAcidRowBatchReader to be used with LLAP IO elevator over original acid files	In Progress	Teddy Choi
3.	VectorizedOrcAcidRowBatchReader.computeOffsetAndBucket optimization	Closed	Saurabh Seth
4.	Aggregation with struct in LLAP produces wrong result	Closed	Saurabh Seth
5.	Enable runWorker() UDF to launch compactor from .q tests	Open	Unassigned
6.	select ROW__ID, t, si, i from over10k_orc_bucketed where b = 4294967363 and t < 100 order by ROW__ID fails on LLAP	Closed	Eugene Koifman
7.	OrcSplit.canUseLlapIo() disables LLAP IO for non-vectorized acid reads	Resolved	Unassigned
8.	clean up acid_vectorization_original.q	Closed	Eugene Koifman

Activity

People

Assignee:: Eugene Koifman

Reporter:: Eugene Koifman

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 06/Sep/17 01:22

Updated:: 22/May/18 23:16

Resolved:: 04/Nov/17 17:15