[ARROW-16638] [Go][Parquet] Boolean column reader fails to skip rows - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 9.0.0
Component/s: Go
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/20257
Language:
- Go

Description

Skipping values in the go parquet column reader is effectively implemented by reading the target number of rows into scratch space which is then discarded. In the boolean case, BytesRequired returns returns a scratch buffer that allocates one bit per row, however that same scratch space is also attempted to be used for `defLvls` and `repLvls` (both int16), which requires two bytes per row. Since the boolean `values` buffer is not large enough to hold the same number of rows worth of def and rep levels, skipping too many rows results in an index out of bounds panic.

Note that for other column types, this does not seem to be an issue since the buffer needed for `values` is always larger than the buffer needed for def and rep levels, however there still seems to be no reason to include any non-nil value to `cr.ReadBatch(...)` for rep and def lvls when skipping any column in the reader.

Attachments

Issue Links

links to

GitHub Pull Request #13221

GitHub Pull Request #13277

Activity

People

Assignee:: Unassigned

Reporter:: Matt DePero

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 23/May/22 23:43

Updated:: 11/Jan/23 11:45

Resolved:: 09/Jun/22 15:18

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

3h 10m