[HIVE-14233] Improve vectorization for ACID by eliminating row-by-row stitching - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.3.0
Component/s: Transactions, Vectorization
Labels:
- TODOC2.3

Target Version/s:

2.2.0

Description

This JIRA proposes to improve vectorization for ACID by eliminating row-by-row stitching when reading back ACID files. In the current implementation, a vectorized row batch is created by populating the batch one row at a time, before the vectorized batch is passed up along the operator pipeline. This row-by-row stitching limitation was because of the fact that the ACID insert/update/delete events from various delta files needed to be merged together before the actual version of a given row was found out. ~~HIVE-14035~~ has enabled us to break away from that limitation by splitting ACID update events into a combination of delete+insert. In fact, it has now enabled us to create splits on delta files.
Building on top of ~~HIVE-14035~~, this JIRA proposes to solve this earlier bottleneck in the vectorized code path for ACID by now directly reading row batches from the underlying ORC files and avoiding any stitching altogether. Once a row batch is read from the split (which may be on a base/delta file), the deleted rows will be found by cross-referencing them against a data structure that will just keep track of deleted events (found in the deleted_delta files). This will lead to a large performance gain when reading ACID files in vectorized fashion, while enabling further optimizations in future that can be done on top of that.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-14233.12.patch
30/Aug/16 22:26
73 kB
Saket Saurabh
HIVE-14233.11.patch
30/Aug/16 20:10
73 kB
Saket Saurabh
HIVE-14233.10.patch
29/Aug/16 16:51
69 kB
Saket Saurabh
HIVE-14233.09.patch
22/Aug/16 13:51
65 kB
Saket Saurabh
HIVE-14233.08.patch
12/Aug/16 22:39
45 kB
Saket Saurabh
HIVE-14233.07.patch
11/Aug/16 23:12
45 kB
Saket Saurabh
HIVE-14233.06.patch
09/Aug/16 20:53
46 kB
Saket Saurabh
HIVE-14233.05.patch
26/Jul/16 23:26
44 kB
Saket Saurabh
HIVE-14233.04.patch
26/Jul/16 20:00
44 kB
Saket Saurabh
HIVE-14233.03.patch
26/Jul/16 05:33
44 kB
Saket Saurabh
HIVE-14233.02.patch
21/Jul/16 23:21
42 kB
Saket Saurabh
HIVE-14233.01.patch
14/Jul/16 18:02
28 kB
Saket Saurabh

Issue Links

blocks

HIVE-17458 VectorizedOrcAcidRowBatchReader doesn't handle 'original' files

Closed

is related to

HIVE-16812 VectorizedOrcAcidRowBatchReader doesn't filter delete events

Closed

requires

HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions

Resolved

links to

ReviewBoard # 50934

Activity

People

Assignee:: Saket Saurabh

Reporter:: Saket Saurabh

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 14/Jul/16 00:18

Updated:: 06/Sep/17 01:23

Resolved:: 31/Aug/16 03:39