From issue https://github.com/apache/arrow/issues/14229.
The bug looks like this:
- create a pandas dataframe with one column and n rows, n < max(int32)
- each elemenet is a list with m integers, m * n > max(int32)
- save to a parquet file
- reading from the parquet file fails with "OSError: List index overflow"
See comment below on details to reproudce this bug:
Tested with a small dataset, the error might come from below code.
OffsetType is int32, but the loop is executed (and *offset is incremented) m * n times which is beyond max(int32).