[HIVE-17458] VectorizedOrcAcidRowBatchReader doesn't handle 'original' files - ASF JIRA

Log work

Agile Board

Rank to Top

Rank to Bottom

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Move

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 3.0.0
Component/s: Transactions
Labels:
None

Target Version/s:

3.0.0

Description

VectorizedOrcAcidRowBatchReader will not be used for original files. This will likely look like a perf regression when converting a table from non-acid to acid until it runs through a major compaction.

With Load Data support, if large files are added via Load Data, the read ops will not vectorize until major compaction.

There is no reason why this should be the case. Just like OrcRawRecordMerger, VectorizedOrcAcidRowBatchReader can look at the other files in the logical tranche/bucket and calculate the offset for the RowBatch of the split. (Presumably getRecordReader().getRowNumber() works the same in vector mode).

In this case we don't even need OrcSplit.isOriginal() - the reader can infer it from file path... which in particular simplifies OrcInputFormat.determineSplitStrategies()

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-17458.16.patch
03/Nov/17 17:02
125 kB
Eugene Koifman
HIVE-17458.15.patch
01/Nov/17 18:22
123 kB
Eugene Koifman
HIVE-17458.14.patch
01/Nov/17 14:24
122 kB
Eugene Koifman
HIVE-17458.13.patch
31/Oct/17 22:08
122 kB
Eugene Koifman
HIVE-17458.12.patch
30/Oct/17 21:58
145 kB
Eugene Koifman
HIVE-17458.12.patch
30/Oct/17 22:13
145 kB
Eugene Koifman
HIVE-17458.11.patch
27/Oct/17 21:15
140 kB
Eugene Koifman
HIVE-17458.10.patch
27/Oct/17 01:46
140 kB
Eugene Koifman
HIVE-17458.09.patch
25/Oct/17 15:26
105 kB
Eugene Koifman
HIVE-17458.08.patch
25/Oct/17 02:25
105 kB
Eugene Koifman
HIVE-17458.07.patch
24/Oct/17 19:41
45 kB
Eugene Koifman
HIVE-17458.07.patch
24/Oct/17 21:10
88 kB
Eugene Koifman
HIVE-17458.06.patch
24/Oct/17 05:48
37 kB
Eugene Koifman
HIVE-17458.05.patch
23/Oct/17 19:32
37 kB
Eugene Koifman
HIVE-17458.04.patch
21/Oct/17 00:29
35 kB
Eugene Koifman
HIVE-17458.03.patch
20/Oct/17 02:05
24 kB
Eugene Koifman
HIVE-17458.02.patch
19/Oct/17 00:23
19 kB
Eugene Koifman
HIVE-17458.01.patch
18/Oct/17 01:08
13 kB
Eugene Koifman

Issue Links

Add Link

is blocked by

HIVE-17923 'cluster by' should not be needed for a bucketed table

Resolved

Delete this link

HIVE-12631 LLAP IO: support ORC ACID tables

Closed

Delete this link

HIVE-14233 Improve vectorization for ACID by eliminating row-by-row stitching

Resolved

Delete this link

is related to

HIVE-14878 integrate MM tables into ACID: add separate ACID type

Resolved

Delete this link

HIVE-17854 LlapRecordReader should have getRowNumber() like org.apache.orc.RecordReader

Open

Delete this link

relates to

HIVE-16812 VectorizedOrcAcidRowBatchReader doesn't filter delete events

Closed

Delete this link

HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions

Resolved

Delete this link

HIVE-18045 can VectorizedOrcAcidRowBatchReader be used all the time

Open

Delete this link

links to

Review Board

Delete this link

(3 relates to, 1 links to)

Sub-Tasks

Create Sub-Task

1.	can VectorizedOrcAcidRowReader be removed once HIVE-17458 is done?	Closed	Eugene Koifman	Actions
2.	Enable VectorizedOrcAcidRowBatchReader to be used with LLAP IO elevator over original acid files	In Progress	Teddy Choi	Actions
3.	VectorizedOrcAcidRowBatchReader.computeOffsetAndBucket optimization	Closed	Saurabh Seth	Actions
4.	Aggregation with struct in LLAP produces wrong result	Closed	Saurabh Seth	Actions
5.	Enable runWorker() UDF to launch compactor from .q tests	Open	Unassigned	Actions
6.	select ROW__ID, t, si, i from over10k_orc_bucketed where b = 4294967363 and t < 100 order by ROW__ID fails on LLAP	Closed	Eugene Koifman	Actions
7.	OrcSplit.canUseLlapIo() disables LLAP IO for non-vectorized acid reads	Resolved	Unassigned	Actions
8.	clean up acid_vectorization_original.q	Closed	Eugene Koifman	Actions

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Eugene Koifman Assign to me

Reporter:: Eugene Koifman

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 06/Sep/17 01:22

Updated:: 22/May/18 23:16

Resolved:: 04/Nov/17 17:15

Agile

View on Board

VectorizedOrcAcidRowBatchReader doesn't handle 'original' files

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates

Agile

Slack

Issue deployment