Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
From issue https://github.com/apache/arrow/issues/14229.
The bug looks like this:
- create a pandas dataframe with one column and n rows, n < max(int32)
- each elemenet is a list with m integers, m * n > max(int32)
- save to a parquet file
- reading from the parquet file fails with "OSError: List index overflow"
See comment below on details to reproudce this bug:
https://github.com/apache/arrow/issues/14229#issuecomment-1272223773
Tested with a small dataset, the error might come from below code.
https://github.com/apache/arrow/blob/master/cpp/src/parquet/level_conversion.cc#L63-L64
OffsetType is int32, but the loop is executed (and *offset is incremented) m * n times which is beyond max(int32).