Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 2.8.0
Description
parquet-cpp implemented this optimisation here: https://github.com/apache/parquet-cpp/pull/140/commits/3f10378c5fc56c346ce77bf9e9faf011ead9c5e6
The basic idea is to add a batched interface to DictDecoder and RleDecoder, and support passing in a dictionary to RleDecoder. It should then be possible to significantly optimise the decoding.
We should add a microbenchmark for DictDecoder. and updated the benchmark for RleDecoder so we can understand the perf.
Attachments
Issue Links
- breaks
-
IMPALA-6217 parquet-column-readers.cc:417] Check failed: def_levels_.CacheHasNext()
- Resolved
-
IMPALA-6946 Hit DCHECK in impala::RleBatchDecoder<unsigned int>::GetRepeatedValue
- Resolved
- is a child of
-
IMPALA-4123 Columnar decoding in Parquet scanner
- Resolved