[IMPALA-6700] Improve error handling of Parquet RLE decoding - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Backend
Labels:
- parquet

Epic Color:
ghx-label-1

Description

RleBatchDecoder's error handling should be stricter and more explicit in my opinion. My main problem with the current solution is readability - it is hard to tell at the first glance what will be the consequences of different decoding errors.

There are 4 things that can go wrong during RLE decoding:
1. repeated / literal run found with count 0 (now this leads to error in some cases, while is skipped in some others)
2. literal run that exceeds the input (currently GetLiteralValues returns false in this case)
3. more values are expected, but there are no more runs in the input (currently this has to be checked by the caller)
4. there shouldn't be any more value, but the input still has more bytes (this is currently not checked)

1 and 2 should set RleBatchDecoder to an error state, while there should be a function to check 4 (e.g. STATUS CheckIfAtTheEnd() ). Handling of 3 is ok as it is.

Changing 1 and 4 would mean that some Parquet pages that are currently accepted would be considered corrupt.

1 is a valid change in my opinion, as parquet-mr also doesn't handle 0 counts - 0 sized literal runs wil lead to exception, while 0 sizes repeated runs will lead to incorrect results, see https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/rle/RunLengthBitPackingHybridDecoder.java
4 could be useful to detect errors in encoders / corrupted data - nearly any random buffer can be decoded as RLE, but it is unlikely that the number of encoded values will match the number of expected values

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Csaba Ringhofer

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 19/Mar/18 16:34

Updated:: 19/Mar/18 17:43