[IMPALA-4177] Add batch dictionary/RLE decoding in Parquet - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: Impala 2.8.0
Fix Version/s: Impala 2.11.0
Component/s: Backend
Labels:
- perf

Target Version:

Product Backlog

Description

parquet-cpp implemented this optimisation here: https://github.com/apache/parquet-cpp/pull/140/commits/3f10378c5fc56c346ce77bf9e9faf011ead9c5e6

The basic idea is to add a batched interface to DictDecoder and RleDecoder, and support passing in a dictionary to RleDecoder. It should then be possible to significantly optimise the decoding.

We should add a microbenchmark for DictDecoder. and updated the benchmark for RleDecoder so we can understand the perf.

Attachments

Issue Links

breaks

IMPALA-6217 parquet-column-readers.cc:417] Check failed: def_levels_.CacheHasNext()

Resolved

IMPALA-6946 Hit DCHECK in impala::RleBatchDecoder<unsigned int>::GetRepeatedValue

Resolved

is a child of

IMPALA-4123 Columnar decoding in Parquet scanner

Resolved

Activity

People

Assignee:: Tim Armstrong

Reporter:: Tim Armstrong

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Sep/16 17:40

Updated:: 22/May/18 13:23

Resolved:: 16/Nov/17 21:26