[HIVE-17915] Enable VectorizedOrcAcidRowBatchReader to be used with LLAP IO elevator over original acid files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: In Progress
Priority: Critical
Resolution: Unresolved
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: Transactions
Labels:
None

Target Version/s:

3.0.0

Description

Since ~~HIVE-12631~~, LLAP IO can support Acid tables but when reading "original" files.
~~HIVE-17458~~ enables VectorizedOrcAcidRowBatchReader to vectorize reads over "original" files but not with LLAP IO.

Current implementation of OrcSplit.canUseLlapIo() is the same as in ~~HIVE-12631~~.
This can/should be improved. There are 2 parts to this:

When a read of "original" file is performed such that data doesn't need to be decorated with ROW_ID (see __VectorizedOrcAcidRowBatchReader.canUseLlapForAcid()) then VectorizedOrcAcidRowBatchReader as of ~~HIVE-17458~~ should be usable with LLAP IO but when I tried it I got ArrayIndexOutOfBoundsException in various places of the stack.
This is the more important one.

The 2nd issue is that reading "original" acid files (when ROW_IDs are needed) requires using _org.apache.hadoop.hive.ql.io.orc.RecordReader.getRowNumber() in _VectorizedOrcAcidRowBatchReader
This API is not available on the reader that LlapRecordReader provides.

It would be better if getRowNumber() was available for performance as well as simpler logic in the code.

cc sershe, teddy.choi

Attachments

Issue Links

is related to

HIVE-17944 OrcSplit.canUseLlapIo() disables LLAP IO for non-vectorized acid reads

Resolved

Activity

People

Assignee:: Teddy Choi

Reporter:: Eugene Koifman

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 26/Oct/17 23:10

Updated:: 17/Jan/18 02:47